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INTRODUCTION 


This  report  describes  a  PASCAL  compiler  which  is  rather 
unique  in  that  its  target  language  is  the  lambda-calculus 
instead  of  some  machine  code.  Although  the  object  code  that  it 
generates  can  be  executed  by  means  of  a  lambda-expression- 
reducer  [3]  (resembling  a  pure  LISP  interpreter),  the  intended 
use  of  the  code  is  in  proving  programs  correct. 

The  compiler  is  written  in  PASCAL  itself  and  contains  an 
attributed  LL(1)  parser  [7]  of  the  complete  standard  PASCAL 
language  (5).  Its  error  recovery  is  quite  elaborate  and  it 
provides  substantially  better  error  diagnostics  than  several 
existing  standard  PASCAL  compilers  [9].  It  produces  code  for  a 
large  subset  of  standard  PASCAL,  covering  most  programs  that  are 
interesting  from  the  theoretical  program-verification  viewpoint. 
The  translated  features  include:  multidimensional  arrays, 
assignment  statements  in  their  generality,  I/O  statements, 
compound,  conditional  and  repetitive  statements,  procedures, 
with  recursive  calls  and  global  side  effects  allowed. 

This  report  is  divided  into  two  parts:  The  first  part  gives 
a  formal  definition  of  the  version  of  lambda-calculus  used  as 
the  target  language,  and  describes  the  representation  rules  to 
translate  PASCAL  programs  into  the  lambda-calculus.  The  second 
part  contains  a  code-independent  description  of  compilation  al¬ 
gorithms  including  the  complete  LL(1)  push-down  automaton.  It  is 
assumed  that  the  reader  is  familiar  with  the  basic  ideas  of  the 
lambda-calculus  [4,10]  and  the  top-down  parsing  methods  [7]. 


PART  Is  THE  TARGET  LANGUAGE 


1.1.  Introduction 

The  target  language  of  the  compiler  is  a  slightly  modified 
form  of  the  lambda-calculus  [4,10].  The  structure  of  the  PASCAL 
source  program  will  be  partially  preserved  by  the  translation 
into  this  language.  Theoretically,  an  object  program  could  be 
converted  into  a  single  lambda-expression.  However,  this  is 
undesirable  since  the  resulting  code  will  lack  clarity  and  will 
be  inefficient  for  later  automatic  evaluation.  Furthermore  there 
is  a  ("software")  machine  which  will  execute  this  language  in  a 
slightly  different  syntactic  setting  [3].  This  part  introduces 
the  essential  concepts  of  the  calculus  and  its  modelling 
capabilities  for  PASCAL  programs.  For  a  detailed  discussion  of 
the  lambda-calculus  model  of  ALGOL- like  programming  languages, 
the  reader  is  referred  to  [1,2]. 


1.2.  The  Lambda-Calculus 


Adopting  a  commonly  used  terminology  [1],  the  syntax  of  the 
lambda-calculus  is  given  by  the  following  BNF-definition: 

(1)  <indeterminate>  ::=  <PASCAL-identif ier  including  '§'  in  the 

letter  set> 

(2)  < lambda- expres si on>  ::  =  < indeterminate > 

(3)  <lambda-expression>  ::=  <application> 

(4)  <lambda-expression>  ::=  <abstraction> 

(5)  <application>  ::=  ( <lambda-expression>  <lambda-expression> ) 

(6)  <abstraction>  : :=  (%<binding  indeterminates> : clambda-expres- 

sion>)+ 

(7)  <binding  indeterminates>  : :=  <indeterminate> 

When  using  multi-character  symbols,  it  may  be  necessary  to 
separate  the  lambda-expressions  in  production  (5)  by  blank 
spaces.  Blank  spaces  are  allowed  also  whenever  no  syntactic  unit 
is  split  apart.  The  lambda-expression  in  production  (6)  is  the 
scope  of  the  preceding  binding  indeterminates .  An  instance  of  an 
indeterminate  within  a  given  lambda-expression  is  bound  if  it 
occurs  in  the  scope  of  a  same  binding  indeterminate.  However,  it 
is  bound  by  only  the  same  innermost  binding  indeterminate. 
Otherwise  its  occurence  is  free  in  this  lambda-expression.  If  e, 
fl,  f 2 , . . . ,  fn  are  lambda-expressions  and  xl,  x2,...,  xn  are 


*  Note  that  the  percent  sign  is  used  to  denote  the  Greek  letter 
lambda,  because  the  printer  on  which  this  document  is  being 
produced  is  not  equipped  with  a  Greek  character  font. 


pairwise  distinct  indeterminates+,  then 

sub [ f 1 , xl ;  £2 , x2 ; . . . ;  fn,xn;  e] 

denotes  the  result  of  simultaneously  substituting  fi  for  all 
free  occurences  of  xi  (l£i£n)  in  e. 

The  lambda-calculus  contains  the  following  contraction  and 
expansion  rules: 


Alpha- conversion  (renaming  bound  variables) : 

(%  x:  e)  — (alpha) — >  (%  y:  sub[y,x;  e])  provided  that 
y  has  no  free  occurence  in  e. 

Beta-contraction  (substitution): 

( (%  x:  e)f)  — (beta) — >  sub[f,x;  e]  if  no  free  inde- 
terminates  in  f  occur  bound  in  e. 

Eta-contraction  (extensionality ) : 

(%  x:(e  x))  — (eta) — >  e  if  x  does  not  occur  free  in 

e.  +  + 

The  converses  of  beta-  and  eta-contractions  are  called 
beta-  and  eta-expansions,  respectively.  A  (possibly  empty) 
sequence  of  contractions  and  alpha-conversions  is  called 
reduction  (denoted  by  " — >").  Conversion  ("< — >" )  also  includes 
expansions.  An  irreducible  lambda-expression  cannot  be  beta-  or 
eta-contracted  further.  If  a  lambda-expression  e  can  be 
converted  into  an  irreducible  lambda-expression  f,  then  f  is 
uniquely  determined  up  to  alpha-conversions  and,  futhermore,  e 
— >  f.  The  leftmost  (outermost)  computation  rule  is  safe  in  the 
sense  that  it  always  leads  to  this  irreducible  lambda-expression 
("normal  form")  provided  that  this  expression  exists.  Most 
results  may  be  proved  using  the  Church-Rosser  Theorem  [4]:  If  e 
< — >  g,  then  there  is  a  conversion  from  e  into  g  in  which  no 
expansion  preceeds  any  contraction. 

The  following  lemma  suggests  a  useful  extension  of  the 
syntax  for  lambda-expressions: 

Lemma: 

( . . . ( (%xl: (%x2: ( . . . (%xn:e) . . . ) ) )fl) . . .fn) — >sub[ f 1 , xl ; f2 , x2 ; . . . ; 
fn,xn;  e]  provided  no  free  indeterminate  in  any  of  the  fi's  is 


+  In  the  following,  e,  f,  g, . . .  will  denote  lambda-expressions 
and  x,  y, . . .  indeterminates .  Unless  stated  otherwise,  they  are 
always  assumed  universally  quantified  in  definitions  and 
theorems  of  the  met a- language. 

**  This  rule  will  not  be  applicable  to  the  lambda-expressions 
generated  by  the  compiler. 
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bound  in  e. 

Therefore  the  syntax  of  lambda-expressions  will  be  extended  by 
allowing  a  list  of  indeterminates  in  abstractions,  viz. 

(8)  <binding  indeterminates>  ::=  <binding  indeterminates>, <inde- 

terminate> 

This  notation  can  be  viewed  as  a  shorthand  for  nested  abstrac¬ 
tions: 

(%  xl,x2, . . . ,xn:  e)  means  (%  xl:(%  x2 :(...(%  xn:  e )...))). 

However,  an  automatic  evaluator  can  use  the  above  lemma  for  a 
faster  substitution  algorithm  for  lambda-expressions  in  this 
form. 


1.3.  Systems  of  Lambda-Expression  Definitions 

It  is  a  convenient  practice  to  define  a  certain  name*  to  be 
a  representative  of  a  given  lambda-expression.  Then  this  name 
may  be  used  many  times  without  rewriting  its  whole  definition. 
The  following  productions  complete  the  syntax  of  the  target 
language: 

(9)  <list  of  definitions>  : :=  <definition> 

(10)  <list  of  definitions>  : :=  <list  of  definitions><definition> 

(11)  <definition>  : :=  <name>=<lambda-expression> . 

(12)  <lambda-expression>  : :=  <name> 

(13)  <name>  ::=  <PASCAL-identifier  including  '$'  in  the  letter 

set> 

An  "object"-program  is  then  a  <list  of  definitions> .  At 
this  point  a  single  lambda- expression  may  not  always  be 
recovered  by  merely  replacing  all  names  by  their  corresponding 
lambda-expressions  because  some  names  could  be  referred 
recursively  using  their  own  names  on  right-hand  sides  of  their 
definitions.  Before  this  problem  can  be  resolved,  some  basic 
lambda-expressions  shall  be  introduced.  Since  the  compiler 
generates  source-program-dependent  names,  special  or  predefined 
names  will  always  be  distinguished  from  these  by  a  preceeding 
'$'  character.  This  is  the  reason  for  including  a  '$'  sign  to 
the  PASCAL  letter  set  in  the  productions  (1)  and  (13). 


*  Names  act  like  variables  in  the  language.  They  are  distinctly 
different  from  indeterminates  in  that  the  language  does  not 
contain  any  rules  indicating  what  objects  indeterminates 
represent  or  how  they  may  "vary"  throughout  a  calculation.  By 
specifying  theorems  about  the  language,  indeterminates  often 
attain  the  property  of  meta-variables.  Therefore  "name"  was 
chosen  to  avoid  possible  confusion. 
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$ID  =  (%  X:  X). 

The  identity  ("do-nothing")  function  and  empty  list. 

$CAT  =  (%  X, Y :  (%  Z:  ( (X  Z)  Y) ) ) . 

The  concatenation  of  objects.  If  (%  X:  ((((X  yl)  y 2) 
...)  yn))  represents  a  list  of  n  elements  $CAT  will  then 
append  an  element  to  a  list  of  this  structure. 

$OMEGA  =  ((%  X/ Y:  (X  Y) ) (%  X,Y:  (X  Y) ) ) . 

The  undefined  value.  It  should  be  noted  that  ($OMEGA  f) 
— >  $OMEGA  but  $OMEGA  does  not  possess  a  normal  form. 

$Y  =  (%  X: ( (%  Y:(X(Y  Y) ) ) (%  Y: (X(Y  Y) ) ) ) ) . 

The  recursion  operator.  Since  ($Y  g)  < — >  (g  ($Y  g)), 

($Y  g)  is  a  solution  to  the  recursive  definition  of  F  = 
(g  F),  provided  that  g  does  not  contain  the  name  F. 

$Y  can  now  be  employed  to  model  general  recursion.  First  it 
is  necessary  to  beta-expand  the  right-hand  sides  of  recursive 
definitions  into  the  form 

((((9  <name>)<name>) . . . )<name>) 

such  that  no  name  occurs  inside  the  lambda -expression  g  and  all 
recursively  referenced  names  are  listed  in  a  given  order.  An 
explicit  solution  of  the  system  of  definitions 

nl  =  ((((gl  nl)  n2)...)  nk) . 
n2  =  ( ( ( ( g2  nl)  n2). . . )  nk) . 

•  • 

nk  =  ((((gk  nl)  n2)...)  nk). 

is  determined  by 

ni  =  ( ( ( (Y[i,k]  gl)  g2)...)  gk) ,  with 

XZ-list  =  (%  Y:((...  ( ( Y(X  Z1))(X  Z2))  . . . ) (X  Zk))),  and 
Y[i,k]  =  (%  Z1 . Zk:(($Y  (%  X:  XZ-list) ) (%  XI _ ,Xk:Xi))). 

This  is  also  the  least  fixed  point  solution  under  a  certain 
ordering  18].  However,  it  should  be  noted  that  an  automatic 
evaluator  will  work  more  efficiently  by  replacing  names 
recursively  during  execution  time  rather  than  introducing  the  $Y 
operator  beforehand. 

1.4.  Primitives  in  the  Model 


It  is  possible  to  represent  natural  numbers  and  arithmetic 
operations  in  the  lambda-calculus  [4].  This  representation  can 
be  extended  also  to  (signed)  integers  and  much  of  computer 
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arithmetic  [1]. 

But  instead  of  defining  them  as  lambda-expressions,  we 
accept  a  number  of  arithmetical  and  logical  constants  and 
operators  as  primitives  in  our  model.  The  reduction  charac¬ 
teristics  of  these  primitives  reflect  the  algebraic  properties 
of  the  corresponding  objects  (viz.  the  integers  and  the  logical 
values) .  An  evaluator  program  can  simulate  these  primitives  with 
computer- internal  arithmetic  operations  rather  than  using  their 
lambda-calculus  definitions,  thus  gaining  a  considerable  gain  in 
speed. 

The  compiled  program  may  contain  the  following  names 
associated  with  primitives: 

0,  1,  2,  .  The  positive  integers.  These  are  the  only  names 

syntactically  different  from  identifiers.  n, 
m, . . .  will  denote  lambda-expressions  reducing  to 
the  integers  n,  m, . . .  or  their  names. 
$MINUSUNARY. . .  Integer  negation. 


$PLUS .  Integer  addition. 

$MINUS .  Integer  subtraction. 

$MULT .  Integer  multiplication. 

$DIV .  Integer  division. 

$TRUE .  Boolean  value  true.  It  is  assumed  that  (($TRUE  g) 

h)  — >  g. 

$FALSE .  Boolean  value  false .  It  is  assumed  that  (($FALSE 

g)  h)  — >  h. 

$N0T .  Boolean  negation. 

$AND .  Boolean  conjunction. 

$OR .  Boolean  disjunction. 

$EQ .  Integer  comparison  equal. 

$NE .  Integer  comparison  not  equal. 

$GT .  Integer  comparison  greater. 

$GE .  Integer  comparison  greater  than  or  equal  to. 

$LT .  Integer  comparison  less. 

$LE .  Integer  comparison  less  than  or  equal  to. 


The  following  three  primitives  are  used  for  array  handling. 
An  n-dimensional  PASCAL  array  is  treated  as  a  vector  of  n-1 
dimensional  arrays  with  0  dimensional  arrays  treated  as  scalar 
objects.  Vectors  will  be  treated  as  lists  in  the  object  language 
(see  $CAT) .  For  the  definition  of  these  primitives  in  the 
lambda-calculus,  see  [1]. 


$TUPINIT .  Initialization  of  an  array.  Its  reduction 

property  is  ( . . . ( ($TUPINIT  n)  ml)...  mn)  — > 
"list  of  ml  lists  of  m2  lists  of  ...  mn  $0MEGAs" . 

$ RETRIEVE .  Indexing  of  vector  elements.  Its  reduction 

property  is  (f  ( ( $R£TRIEVE  i)  k) )  — >  "i-th 
element  of  f  if  f  is  a  list  of  k  items  (lambda- 
expressions)”  . 
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$  REP  LACE .  Assigning  a  vector  element.  Its  reduction 

property  is  (f(((  $REPLACE  i)  k)  g)  — >  "list  of 
k  items  provided  that  f  is  a  list  of  k  elements 
where  all  but  the  i-th  element  are  copied  from  f 
and  the  i-th  position  is  g" . 

In  the  lambda-calculus,  characters  may  be  modelled  by  their 
corresponding  numerical  code.  In  order  to  distinguish  the  codes 
from  numbers  a  primitive  equivalent  to  the  standard  PASCAL 
function  "CHR"  is  introduced: 

CHR .  Code  character.  In  the  pure  lambda-calculus 

CHR  =  $  ID . 

1.5.  Functional  Semantics  of  Variables  and  Statements 

Each  statement  of  a  PASCAL  program  may  be  thought  of  as 

operateing  on  two  different  entities:  A  set  of  variables  (global 

and  local)  addressable  at  the  time  the  statement  is  being 
executed  (its  "environment"),  and  some  sort  of  register 
indicating  which  place  in  the  program  is  currently  executed.  It 
is  not  hard  to  imagine  that  this  register  may  contain  an 
eventually  recursive  description  of  the  entire  portion  of  the 
program  not  executed  so  far  (the  "continuation"  or  the  "program 
remainder").  With  this  view  a  statement  acts  more  like  a 
functional  since  one  of  its  arguments,  the  continuation,  itself 

turns  out  to  be  a  function.  A  statement  can  then  be  translated 

into  an  abstraction  with  respect  to  the  continuation,  denoted  by 
the  indeterminate  "$PHI",  and  the  environment  variables,  denoted 
by  their  PASCAL  identifiers  whenever  possible.  If  imported  and 
local  identifiers  coincide,  conflicts  will  be  resolved  by 
appending  "$"  and  the  proper  block  level  number  to  these 
identifiers.  If  all  continuations  and  current  values  of 
variables  are  arranged  in  a  certain  list  form,  this  abstraction 
will  not  become  too  complex  to  construct. 

In  the  following,  a  representation  rule  [2]  of  the  form 

{S}/(vl,  v2,...,  vn)  =  abstraction, 

where  S  is  a  statement  and  (vl,  v2,...,  vn)  is  its  environment, 
will  be  used  to  describe  which  kind  of  abstractions  model  these 
statements,  and  to  give  a  more  concise  expression  to  the 
underlying  ideas.  As  the  compiler  defines  each  abstraction  of 
the  statement  i  by  the  name  "$STMi",  representation  rules  can 
also  be  seen  as  patterns  for  these  definitions.  For  a  discussion 
of  how  representation  rules  are  derived,  see  [2]. 
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1.6.  Compound  Statements  and  Blocks 

Compound  statements  are  compositions  of  functions.  This 
suggests  the  following  representation  rule: 

(begin;  S1;S2;...;  Sn  end  1 /E  = 

(%  $PHI : ( { SI I/E  ({S2J/E  (...({Sn}/E  $PHI )...)))) . 

It  should  be  noted  that  the  program  remainder  of  each  Si  is  the 
first  operand  applied  to  the  statement.  Therefore  --  and  this  is 
true  for  all  representations  --  a  statement  representation 
merely  has  to  substitute  this  first  operand  into  a  place  where 
it  will  become  applicable  after  the  statement's  reduction  is 
finished.  In  the  following,  the  environments  will  not  be 
explicitly  specified  if  they  stay  the  same  throughout  a 
representation  rule. 

Blocks  introduce  new  (local)  variables,  initialize  them  to 
an  undefined  value  and  delete  them  from  the  environment  after 
execution  of  their  body.  Let  E  be  the  global  and  F=(ul,...,  um, 
vl,...,  vn)  be  the  local  environment  (identifier  conflicts 
already  resolved): 

jvar  ul : <typel>; . . . ;  um:<typem>;  begin  SI;...;  Sk  end } /E  = 

(%  $PHI :(...((({ SI }/F  ( { S2 }/F  (...({Sk}/F  (%  ul , . . . , um : $PHI ) ) 
...))) (init  ul } )  (init  u2 } ) . . .  (init  urn})); 
where  { init  u}  is  $OMEGA  if  u  is  a  scalar  variable, 

is  (($TUPINIT  1)  p)  if  u  is  a  vector  of  p  items, 
is  ( ( ( $TUPINIT  2)  gl)  %2)  if  u  is  a  pl*p2  matrix. 


As  a  statement  is  translated  into  an  abstraction  of  $PHI  and  the 
environment  variables,  all  current  values  of  these  variables 
must  follow  the  continuation  before  and  also  after  the 
abstraction  was  reduced.  It  should  be  noted  that  an  attempt  to 
reference  an  undefined  value  will  result  in  an  infinite 
reduction  sequence  due  to  a  property  of  $OMEGA. 

1.7.  Expressions  and  Assignments 

So  far  the  compilation  of  only  integers  and  Boolean 
constants  has  been  specified.  But  since  scalar  identifiers  and 
characters  can  be  identified  with  the  ordinal  numbers  encoding 
them,  all  scalar  constants  suitable  for  lambda-calculus 
representation  can  be  compiled.  Entire  variables  are  indetermi- 
nates  of  their  own  identifiers  with  ambiguities  removed.  As  a 
slight  restriction  only  unary  and  binary  operators  on  scalar 
operands  are  accepted.  Due  to  the  list-like  translation  of 
arrays,  records  can  be  viewed  as  a  special  form  of  arrays  and 
their  field  identifiers  as  indices. 
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Rational  and  real  numbers  are  not  treated  here  (see  [1]). 
Literals  and  sets  are  part  of  PASCAL  because  they  allow  a  very 
efficient  implementation  on  a  binary  computer,  but  could  be 
simulated  in  the  lambda-calculus  only  by  a  rather  clumsy 
representation.  Also,  pointer  variables  have  been  omitted  since 
their  representation  will  be  quite  complex.  Files  will  be 
treated  in  section  1.8. 

The  representation  rules  for  expressions  (without  function 
calls)  heavily  involve  recursion.  The  most  significant  are 
sketched  below.  In  some  instances  a  nesting  level  number  is  used 
as  a  superscript  on  matching  pairs  of  parentheses  to  make  the 
rules  more  readable: 

Since  our  model  requires  prefix  operators,  the  following  repre¬ 
sentation  rules  are  essentially  infix  to  prefix  translations: 

{<expr.l>  <binary  operator>  <expr.2>]  = 

(("primitive  of  binary  oper."  [<epr.l>])  {<expr.2}). 

{<unary  operator>  <expression> }  5 

("primitive  of  unary  oper."  { <expression> } ) . 

For  arrays  in  list  form,  it  is  assumed  that  the  index  origin  is 
at  1.  Therefore  the  compiler  has  to  translate  all  index 
references  explicitly  to  this  origin.  Let  v  be  an  array 
llb..hb],  b  an  array  [BOOLEAN]  and  d  be  an  array  [lbl..hbl, 
lb2 . . hb2 ] : 

[v  [ <expression> ]  ]  * 

(v  (‘ (2$RETRIEVE  ( 3 ( $MINUS { <expression> } )  lb-13 ) 2  Jhb-lb+l1 ) ) . 
[b  [ <expression> ]  }  = 

(b  (‘ (2$RETRIEVE  ( 3 ( { <expression> }  1)  23)2)  21))- 

(d  [<expr.l>,  <expr.2>]  }  = 

( ( xd  (2(3$RETRIEVE  (4($MINUS  { <expr . 1> } )  lbl-14 )3) 
hbl-lbl+12)1 )(S(S$RETRIEVE  (7($MINUS  [<expr.2>])  lb2-l7)6) 
hb2-lb2+ls ) )  . 


In  the  following,  CHR  and  ORD  are  the  standard  PASCAL  functions 
on  scalars: 

[ * <character> ' }  =  (CHR  "order  of  this  character"). 
jCHR(<expression>) }  =  (CHR  [ <expression> } ) . 

[ORD( <expression> ) }  =  { <expression> } . 

[ -n}  =  ($MINUSUNARY  n) . 

An  assignment  statement  will  be  translated  into  a 
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substitution  of  the  right-hand  expression  for  the  left-hand 
variable  position  in  the  list  of  corresponding  indetermi nates . 
Assignments  to  elements  of  an  array  complicate  this  process 
somewhat : 

{vi :=<expression> j/E  = 

(%  $PHI ,  vl ,  .  .  .  ,  vn:  (...(((...(( $PHI  vl )  v2 ) . . .  vi-1) 

{ <expression> }/E)  vi+1)...  vn) . 

{v  [<expr.l>]  :=  <expr.2>}/E  = 

(%  $PHI ,  vl .  vn:  (($PHI  vl)  v2 ) . . .  vj-1) 

(iv  ( 2 ( 3 ( 4 $REPLACE  ( 5 ( $MINUS { <expr . 1 > }/E } )  lb-15 ) 4 )hb-lb+ l3 ) 
{<expr.2> }/E} 2 ) 1 ) )  vj+1)...  vn) . 

{d  [<expr.l>,  <expr.2>]  :=  <expr.3>}/E  = 

(%  $PHI,  vl .  vn:  (...(((...( ($PHI  vl)  v2 ) . . .  vk-1) 

assign)  vk+1)...  vn); 
where  assign  is  the  lambda-expression: 

(d(TTTT?$ REPLACE  (3($MINUS  { <expr . 1> }/E )  lbl-13)2)  hbl-lb+1*  ) 
( 4 ( 5d  (6(7$RETRIEVE  (®($MINUS  [ <expr . 1> }/E)  lbl-18 )T) 
hbl-lbl+16)5)  $REPLACE  (12($MINUS  { <expr.2> }/E) 

)  hb2-lb2+l* 1  )  { <expr .  3>  J/E5  )  4)°)). 


1.8.  Files  and  Input-Output 

As  the  I/O  facilities  in  the  lambda-calculus  model  are 
rather  simple,  only  the  two  standard  files  INPUT  and  OUTPUT  are 
supported  during  compilation.  These  files  are  unlike  standard 
PASCAL  files  of  INTEGER.  Their  representations  in  lambda-cal¬ 
culus  are  naturally  lists  denoted  by  the  indeterminate  $SCARDS 
for  INPUT  and  $SPRINT  for  OUTPUT.  Their  initial  values  are  all 
input  items  coded  as  lambda-expressions  for  $SCARDS  and  $ID  (the 
empty  list)  for  $SPRINT.  The  current  file  pointers  INPUT@  and 
OUTPUT®  will  be  treated  as  integer  indeterminates  of  the  same 
name  ( "@"  omitted).  Furthermore,  only  the  two  predefined  I/O 
routines  GET  and  PUT  are  accepted  by  the  code  generation 
routines  of  the  compiler.  Upon  call,  data  is  transferred  between 
the  files  and  their  associated  file  pointers.  Contrary  to 
standard  PASCAL  the  input  file  is  not  automatically  reset  which 
means  that  INPUT@  contains  $OMEGA  at  the  beginning  of  a  program 
and  not  the  first  data  integer  of  $SCARDS. 

In  the  following  representation  rules,  one  should  notice 
that  INPUT  and  OUTPUT  are  automatically  adjoined  to  the  environ¬ 
ment  G=(vl,...,  vn,  INPUT,  OUTPUT)  of  a  statement  if  their 
corresponding  files  appear  in  the  program  head: 
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{  GET  }/G  = 

(%  $PH1,  vl,...,  vn,  OUTPUT,  INPUT: (%  $SCARDS,  $SPRINT: 
((((...( ($PHI  vl)  v2 ) . . .  vn)  OUTPUT)  $SCARDS)  $SPRINT) ) ) . 

{  PUT  }/G  f 

(%  $PHI,  vl,...,  vn,  OUTPUT,  INPUT: (%  $SPRINT: ((((...( ($PHI 
vl)  v2 ) . . .  vn)  OUTPUT)  INPUT)  (($CAT  $SPRINT)  OUTPUT)))). 

These  representations  only  require  $SPRINT  to  be  arranged  in 
list  format  (see  definition  of  $CAT)  whereas  the  input  elements 
merely  have  to  follow  this  output  list.  The  representation  of  a 
complete  program  which  the  compiler  names  $PROGRAM  follows: 

{program...;  Si.}/()  f  ( ( { Si } / ( )  $ID)$ID); 

where  Si  is  the  outermost  begin-end  pair  and  ()  the  empty  list. 

The  first  $ID  is  the  final  program  remainder  and  the  second  the 
empty  $SPRINT.  $PROGRAM  has  the  property  that 

( ($PROGRAM  il)  .  .  .  ip)  — >  (%  X: ( (X  ol ) . . .  og) ) 

where  ol,...,  oq  are  the  output  numbers  which  would  be  obtained 
by  executing  the  program  on  the  input  numbers  il,...,  ip. 

1.9.  Example#! 

The  following  sample  program  illustrates  all  the  concepts 
described  so  far.  The  statement  numbers  listed  will  be  referred 
within  the  generated  code  later  on: 

Stmnr  Source  code: 


(*$U+,X-  superscripts,  no  cross  reference  *) 

PROGRAM  EXAMPLE1( INPUT,  OUTPUT); 

CONST 

LB=2;  HB=5;  (*  bounds  for  V  *) 

LBl=-3;  HB1=0; 

LB2=0;  HB2=5;  (*  bounds  for  D  *) 

TYPE 

SC=( ONE , TWO, THREE ) ; 

LET= ' A ' . . 'Z' ; 

VAR 

I:  INTEGER;  C:  LET;  S:  SC; 

V:  ARRAY [LB. .HB]  OF  INTEGER; 

B:  ARRAY [BOOLEAN]  OF  TWO. .THREE; 

D:  ARRAY [ LB1. .HB1,  LB2..HB2]  OF  CHAR; 

(*  3  dimensions!  *) 

P:  ARRAY [ LET ,  SC]  OF  ARRAY[2..7]  OF  TRUE.. FALSE; 
BEGIN 

(*  The  following  statements  make  no  sense  *) 

(*  but  illustrate  the  compilation  *) 

GET;  I:=INPUT@; 
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4  V[I+1] :=-(I+l); 

5  I:=I+V[  (LB+HB)  DIV  2  J  *  I; 

7  OUTPUT® : =V [ 4 ] ;  PUT ; 

8  B [ FALSE ] :=THREE; 

9  S:=B[  NOT(I<>0)  AND  (V[I]<I+1)  ]; 

10  BEGIN 

11  D[-2,  1*2] :=  ’Q'; 

12  C : =D( HB1-2 ]  [V[3]] 

12  END; 

12  (*  some  difficult  assignments  *) 

13  P[C, S, I ] :=( I<=2)  OR  (C*'B'); 

14  B[P[C,S, I] ] :=TW0 

14  END. 

The  compiler  generated  the  following  code.  Optionally,  matching 
parentheses  are  identified  by  superscripts.  An  asteriks  in 
column  one  signals  a  comment  line  and  this  line  should  be 
ignored  by  automatic  evaluators. 

*  LAMBDA  CODE  FOR  EXAMPLE 1 

$STM2=(11%  $PHI ,P,D,B,V,S,C,I /OUTPUT, INPUT: ( 1  ®%  $ SPRINT, $SCARDS : 
(»(»(7(S(5(4(3(2(i (0$pHi  po )Di )B2)V3)S< )C5)I«)OUTPUT7)$SCARDS® )$ 
SPRINT* ) 1  ®  ) 1 1  ) . 

$STM3=(*%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT: ( • ( 7 ( 6 ( * ( 4 ( 3 ( 2 ( 1 ( ® $PHI 
P® )D7 )B2 ) V3 )S4 )C* ) INPUT* ) OUTPUT 7 ) INPUT® ) * ) . 

$STM4=( 1 4%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT : ( 1 3 ( 1 2 ( 1 1 ( 1 ® ( » ( • ( 2 ( 1 ( 

*  $PHI  P®  )Dl  )B2)(7V(*(5(4$REPLACE  (3(2$MINUS  (^SPLUS  I  ®  )  l1  )  2  )  l3 
)4 )45 ) (2§MINUSUNARY  ( 1 ( ® $PLUS  I®)!1 )2)®)7)®)S,)C1#)I11 ) OUTPUT1 2 ) 
INPUT13)14). 

$STM5=( 1 *%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT: (14(13(12(s(4(3(2(1(® 
$PHI  P® )DX )B2)V3)S4 )C5 ) (‘ 1 (®$PLUS  I® ) ( 1 ® ( *$MULT  ( ®V( 7 ( *$RETRIEVE 
( 5 ( 4 $MINUS  (3(2$DIV  (1(®$PLUS  2°)51)2)23)4)1s)6)47)8),)I1®)11)1 
2  JOUTPUT1  3  )  INPUT1  4  ) 1  *  )  . 

$STM6=( *%  $PHI , P,D, B, V, S, C, I , OUTPUT, INPUT: (®(7(*(s(4(3(2(1(® $PHI 
P®)D1)B2)V3)S4)Cs)I«)(4V(3(2$RETRIEVE  ( 1 ( ®  $MINUS  4® ) l1 ) 2 ) 43 ) 4 ) 7 
) INPUT® ) * ) . 

$STM7=(11%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT: ( 1 ®%  $SPRINT: ( * ( • ( 7 ( « 
(*(4(3(*(1(® $PHI  P® )DX )B2)V3)S4 )Cs )I*)OUTPUT7 )INPUT® ) ( ($CAT  $SPR 
I NT ) OUTPUT )*)1#)11). 

$STM8=(13%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT: ( 1 2 ( 1 1 ( 1 ® ( * ( 8 ( 7 ( « ( 1 ( ® 
$PHI  P® )D‘ )(*B(4(3(2$REPLACE  (1(®$FALSE  1 ® )2 1 ) 2 )2 3 )24 ) s ) 6 ) V7 ) S® ) 
C* ) I 1 ® ) OUTPUT1 1 ) INPUT1 2 ) 1 3 ) . 


$STM9=( 1 •%  $PHI ,P,D, B, V, 5,0, I , OUTPUT, INPUT: ( 1 1 ( 1 6 ( 1 s ( 1 4 ( 1 3 ( 3 ( 2 ( 1 
( ' $PHI  P#)D1)B2)V3)(12B(ll(l# $RETRIEVE  (4(4(7(3$AND  (2$NOT  (7  (°$ 
NE  I#)0l)2)3)(6(5$LT  (4V(3(2$RETRIEVE  (l(#$MINUS  1 4 ) l1 ) 2 )43 ) 4 ) 5 ) 
( 1  (  4  $PLUS  I4)!1)6)7)  I4  )24  ) 1  4  J21  1  ) 1  2) 1  3)C14  )Il  5  JOUTPUT1  ‘JINPUT1  7 
)1#). 

$STM11=(18%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT: ( 1 7 ( 1 « ( 1  * ( *  4 ( 1 3 ( 1 2 ( 1 
1 ( 1 4 ( 4 $PHI  P4 )(’D(*(4(3$REPLACE  (2(1$MINUS  ( 4 $MINUSUNARY  2,)1)(4 
$MINUSUNARY  4 4 > 2 ) 3 )44 ) ( 7 ( *D( 4 ( 3$RETRIEVE  (2(l$MINUS  ( 4 $MINUSUNAR 
Y  2 4 ) 1 ) ( 4 $MINUSUNARY  44  ) 2  )  3 )44  )  5  ) (  6 (  5  (  4 $REPLACE  (3(2$MINUS  (l(#$ 
MULT  1 4 ) 2 1 ) 2 ) ( 4 $MINUSUNARY  l4 ) 3)4 )6* ) ( 4CHR  2164 ) 6 ) 7 ) 4 ) 4 ) 1 4 )Bl 1 )V 
1 2)Sl 3)C14  JI1 5 ) OUTPUT1 6 ) INPUT1 7) 1 8 ) . 

$STM12=(14%  $PHI jPiDjB/VjS/C,!, OUTPUT, INPUT: (13(12(11(14(4(3(2(1 
( 4 $PHI  P9 )D‘ )B2)V3)S4 ) (4(6D(5(4$RETRIEVE  (3(2$MINUS  (1(4$MINUS  0 
°)21 )2)(4$MINUSUNARY  4" ) 3 ) 4 )45 ) 6 ) ( 4 ( 7$RETRIEVE  (s(s$MINUS  <  4 V( 3 ( 
2$RETRIEVE  ( 1 ( 9  $MINUS  3 9 ) l1 ) 2 )43 ) 4 ) 5 ) ( 9 $MINUSUNARY  1 4 ) 6 ) 7 ) 68 ) 4 ) 1 
9 ) I 1 1 ) OUTPUT 1 2 ) INPUT1 3 ) 1 4 ) . 

$STM10=(2%  $PHI : ( 1  $ STM1 1 ( 9 $ STM12  $PHI9)1)2). 

$STM13=(  2  #%  $PHI,P,D,B,V,S,C,  I,  OUTPUT,  INPUT: 

3 ( 1 2 ( 1 1 $PHI  ( 1 4P(4(3(2$REPLACE  (1(°$MINUS  C9 ) 1921 ) 2 )413 ) ( 8 ( 4P ( 3 ( 
2$RETRIEVE  (1(9$MINUS  C4 ) 192 1)2)413)4)(7(1(°  $REPLACE  S°)21)(6(5( 
4P(3(2$RETRIEVE  (1(9$MINUS  C4 )192j )2 )413)4 ) ( 1 (#$RETRIEVE  S9)21)5 
)(4(3(2$REPLACE  (1(9$MINUS  It)l1)2)63)(3 ( 2$0R  (1(9$LE  I4)21)2)(1 
( 4  $EQ  C° ) ( 9CHR  194° )1)3)4)6)7)8),)14)11)D12)B13)V14)S15)C16)I17) 
OUTPUT1 8 ) INPUT1 4 ) 2 9 ) . 

$STM14=( 2 #%  $PHI,P,D,B,V,S,C, I, OUTPUT, INPUT: ( 1 4 ( 1 9 ( 1 7 ( 1 6 ( 1 5 ( 1 4 ( 1 
3(1(9$PHI  P4)D1)(12B(11(19(4$REPLACE  (8(7(6(s(4P(3(2$RETRIEVE  (4 
( 4  $MINUS  CB ) 192 1)2)413)4)(1(°  $RETRIEVE  S4 ) 2 1 ) 5 ) ( 3 ( 2$RETRIEVE  (7( 
4$MINUS  I9 Jl1 )2)63)«)  17)28)4)219)111)12)13)V14)S15)C16)I17)0UTP 
UT18)INPUT14)24). 

$STM1=( 1 2%  $PHI :(((((((((( 1 1 $STM2 ( 1 4 $STM3 ( 4 $STM4( 8 $STM5 ( 7$STM6( 6 
$STM7 ( *$STM8( 4 $STM9( 3$STM10( 2 $STM13 ( 1 $STM14( 4%  P, D, B, V, S, C, I , OUT 
PUT, INPUT : $PHI 4)1)2)3)4)s)6)7)8)4)19)11)(3(2(1(9 $TUPINIT  3 9 ) 41 1 ) 
22)63))(2(» (#$TUPINIT  29)41)62))(1(°$TUPINIT  l9 )2l ) ) ( 1 ( °$TUPINIT 
1# )4l ) )$OMEGA)$OMEGA)$OMEGA)$OMEGA)$OMEGA) 1 2 ) . 

$PROGRAM= ( ( $  STM1  $ID)$ID). 

1.10.  Conditional  Statements 


The  reduction  properties  of  $TRUE  and  $ FALSE  mentioned 
earlier,  together  with  the  definition  $IF=$ID,  imply  a  straight¬ 
forward  representation  of  if- statements: 


{ SI }/E) 


{ if  <expression>  then  SI  else  S2J/E  = 

(%  $PHI ,  vl,...,  vn:  (•••((((($ IF  { <expression> }/E) 

{ S2 } /E )  $PHI )  vl) . . .  vn) . 

{if  <expression>  then  S1J/E  = 

(%  $PHI ,  vl, -  vn:  (...(((( ($IF  { <expression> ) /E )  {Sl}/E) 

$ID)  $PHI )  vl) . . .  vn) . 

Case- statements  could  be  represented  as  a  sequence  of  if- 
statements . 


1.11.  Repetitive  Statements 

Any  PASCAL  loop  can  be  transformed  into  a  while  loop.  For 
instance  repeat  S  until  <expression>  is  equivalent  to  begin  S; 
while  <expression>  do  S  end.  While-statements  themselves  lead  to 
recursive  definitions.  Let  i  be  the  statement  number  of  the  loop 
being  represented: 

{while  <expression>  do  S j/E  = 

$STMi=(%  $PHI,  vl,...,  vn: (...(( 1 (2 (3$IF  { <expression> }/E3) 
( 4 { S } /E  ( 5 $ STMi  $PHI 5 ) 4 ) 2 )  $PHI 1 )  vl)...  vn) .  . 

The  small  but  essential  difference  between  this  if-construction 
and  the  one  in  the  previous  section  1.10  should  be  observed:  The 
alternate  clause  has  to  be  $PHI  instead  of  $ID. 


1.12.  Example#2 


The  following  program  illustrates  compilation  of  while 
loops  and  if  statements: 

Stmnr  Source  code: 
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(*$U+,X-  superscripts,  no  cross  reference  *) 
PROGRAM  SORT (INPUT,  OUTPUT); 

CONST  LB=4;  HB=9 ; 

VAR  A:  ARRAY {LB. .HB)  OF  INTEGER; 

I,  J,  TEMP:  INTEGER; 

NC:  BOOLEAN; 

BEGIN 

I :=LB; 

WHILE  ( I <=HB)  DO 
BEGIN 
GET; 

A( I] : = INPUT® 

END; 

J : =HB ; 

NC : =FALSE ; 

WHILE  ( J>LB )  AND  NOT  NC  DO 
BEGIN 

I :  =LB ; 
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NC:=TRUE; 

WHILE  (I<J)  DO 
BEGIN 

IF  A[ I ] >A[ 1+1 ] 

THEN  BEGIN 

TEMP : =A [ I ] ; 

A  [  I  ]  :  =  A  [  I  + 1  ] ; 
A[  1+1] :=TEMP; 
NC : =FALSE 
END; 

I : *1+1 
END; 

J : =J- 1 
END; 

I : =LB ; 

WHILE  (I<=HB)  DO 
BEGIN 

OUTPUT® :=A[ I ] ; 

PUT 

END 

END. 


*  LAMBDA  CODE  FOR  SORT 

$STM2=(7%  $PHI/NC,TEMP,J,I,A,OUTPUT/INPUT:(6(5(4(3(2(l(0$PHI  NC# 
JTEMP1 ) J2 )43 )A4 ) OUTPUT5) INPUT6)1) . 

$STM5=(’%  $PHI,NC,TEMP, J, I, A, OUTPUT, INPUT: (®%  $SPRINT, $SCARDS : ( 1 
(,<*(4(3(2(i (#$PHI  NC# )TEMP‘ ) J2 ) 1 3 )A4 ) OUTPUT5 )$SCARDS6)$SPRINT7) 

$STM6=(’%  $ PHI, NC, TEMP, J, I, A, OUTPUT, INPUT: (8(7(6(3(2(1 (#$PHI  NC" 
)TEMPl )J2)I3)(*A(4 (*(*$REPLACE  (*(*$MINUS  I#)31)2)63)  INPUT4  )  5  )  6  ) 
OUTPUT7 ) INPUT* )’) . 

$STM4=(2%  $PHI : ( 1 $STM5( 8  $STM6  $PHI1)1)2). 

$STM3=( 1 2%  $PHI , NC, TEMP, J, I , A, OUTPUT, INPUT: ( 1 1 ( 1 0 ( 5 ( 8 ( 7 ( 6 ( 5 ( 4 ( 3 ( 
2$IF(M#$LE  I*  )9*  )2)  ( 1$STM4(  “$STM3  $PHI#  ) 1  )  3)$PHI4  )NC5  )TEMP6  )  J7  ) 
I  •  )A’  ) OUTPUT1  •  )  INPUT1 1  ) 1  2  )  . 

$STM7=( 7%  $PHI , NC, TEMP , J, I , A, OUTPUT, INPUT :(6(5(4(3(2(1(° $PHI  NC# 
JTEMP1  )92  )  1 3  )  A4  )  OUTPUT5  )  INPUT6  )'7  )  . 

$STM8=(’%  $PHI,NC,TEMP, J, I, A, OUTPUT, INPUT: ( 6 ( s ( 4 ( 3 ( 2 ( 1 ( 0 $PHI  $FA 
LSE* )TEMPl ) J2 ) 1 3 ) A4 )OUTPUT5 ) INPUT6)7 ) . 

$STM11=(7%  $PHI,NC,TEMP,J,I,A,OUTPUT,INPUT:(6(5(4(3(2(1(#$PHI  NC 
* )TEMP‘ ) J2 )43 )A4 ) OUTPUT 5 ) INPUT6 ) 7 ) . 
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$STM12=(7%  $PHI,NC,TEMP,J, I, A, OUTPUT, INPUT: ( 6 ( 5 ( 4 ( 3 ( 2 ( 1 ( # $PHI  $T 
RUE# JTEMP1 ) J2)I3)A4 ) OUTPUT 5 ) INPUT*) 7 ) . 

$STM17=( 1 1%  $PHI , NC, TEMP, J, I , A, OUTPUT, INPUT :  ( 1  8  (  3  (  *  (  7  (  *  (  5  ( 0  $PHI 
NC8 ) ( 4 A( 3 ( 2$RETRIEVE  (l(#$MINUS  1 8 )3 1 ) 2 ) 63 ) 4 ) 5 ) Js ) I  7 ) A* )OUTPUT3 ) 
INPUT1 1 ) 1 1 ) . 

$STM18=( 1 2%  $PHI,NC,TEMP, J, I, A, OUTPUT, INPUT: ( 1 1 ( 1 # ( 9 ( 3 ( 2 ( 1 ( 0 $PHI 
NC#  JTEMP1 ) J2 ) I3 ) ( *A( 7 ( 3 ( 2$REPLACE  (*(*$MINUS  I # )3 1 ) 2 ) 63 ) ( 6A( s ( 4 
$ RETRIEVE  (3(2$MINUS  ( 1 ( # $PLUS  I # ) ll ) 2 ) 3 3 ) 4 ) 6s ) 6 ) 7 ) • ) * JOUTPUT1 8 ) 
INPUT1 1 ) 1 2 ) . 

$STM19=(11%  $PHI ,NC, TEMP, J, I , A, OUTPUT, INPUT: ( 1 0 ( 3 ( • ( 3 ( 2 ( 1 ( 8 $PHI 
NC# )TEMPl )J2)I3)(7A(6(*(4 $REPLACE  (3(2$MINUS  (1(®$PLUS  IB)11)2)3 
3 ) 4 )65 )TEMP6 ) 7 ) 8 ) OUTPUT3 ) INPUT1 * ) 1 1 ) . 

$STM20=(7%  $PHI,NC,TEMP,J,I,A,OUTPUT,INPUT:(6(s(4(3(2(1(8$PHI  $F 
ALSE° JTEMP1 )J2)I3)A4 ) OUTPUT5 ) INPUT6 ) 7 ) . 

$STM16=(4%  $PHI : ( 3$STM17 ( 2$STM18( 1 $STM19( 0  $STM20  $PHI • ) 1 ) 2 ) 3 ) 4 ) . 

$STM15= ( 1 3%  $PHI , NC , TEMP , J, I , A, OUTPUT , INPUT :(1>(17(16(15(14(13(1 
2(11(1#(,('$IF(7(5$GT  (4A(3(2$RETRIEVE  (1(°$MINUS  1° )3l )2)63)4)* 
) (6A(5 (4$RETRIEVE  (3(2$MINUS  (1(#$PLUS  1 0 ) l1 ) 2 )3 3 ) 4 ) 6s ) 6 ) 7 ) 8 ) $ST 
M163 )$ID1 1 ) $PHI 11 JNC1 2 ) TEMP1 3)J14)I15)A16 ) OUTPUT1 7 ) INPUT1 * ) 1  * ) . 

$STM21=(7%  $PHI,NC,TEMP,J, I, A, OUTPUT, INPUT: (6(5(4 (3(2(1 (#$PHI  NC 
8 )TEMPl ) J2 ) ( 1 ( 1 $PLUS  I • ) l1 ) 3 )A4 JOUTPUT5 ) INPUT6 ) 7 ) . 

$STM14=(2%  $PHI : ( 1 $STM15 ( #  $STM21  $PHI8)1)2). 

$STM13=( 1 2%  $PHI , NC, TEMP, J, I , A, OUTPUT, INPUT: ( 1 1 ( 1 8 ( 3 ( 8 ( 7 ( 6 ( * ( 4 ( 3 
( 2  $ I F ( 1 (8  $LT  I8)J1)2)(1 $STM14( 8  $STM13  $PHI 8 ) 1 ) 3 )$PHI4 )NCS )TEMP6 ) 
J7 ) I* ) A3 ) OUTPUT 1 8 ) INPUT1 1 ) 1 2 ) . 

$STM22=(7%  $PHI, NC, TEMP, J, I, A, OUTPUT, INPUT: ( 6 ( 5 ( 4 ( 3 ( 2 ( 1 ( 8 $PHI  NC 
8  JTEMP1 ) ( 1 ( 8 $MINUS  J8 ) l1 ) 2 ) 1 3 )A4 )OUTPUT5 ) INPUT6 ) 7 ) . 

$STM10=(4%  $PHI : ( 3$STM11 ( 2$STM12 ( 1 $STM13 ( 8  $STM22  $PHI 8 ) 1 ) 2 ) 3 ) 4 ) . 

$STM9=( 1 4%  $PHI , NC, TEMP , J, I , A, OUTPUT, INPUT :(13(12(11(18(’(8(7(6( 
s(4$IF(3(2$AND  (1("$GT  J8 )4‘ )2)(8$NOT  NC°)3)4)(1$STM10(8$STM9  $P 
HI 8 ) 1 ) 3 ) $PHI6 )NC7 ) TEMP 8 )J,)Il0)A11 ) OUTPUT1 2 ) INPUT1 3) 14 ) . 

$STM23=(7%  $PHI,NC,TEMP,J,I,A,OUTPUT,INPUT:(6(s(4(3(2(1(°$PHI  NC 
8  JTEMP1 )J2)43)A4)OUTPUT*)INPUT6)7) . 


i; 


$STM26=(7%  $PHI,NC, TEMP, J, I, A, OUTPUT, INPUT: (  6 ( s ( 4 ( 3 ( 2 ( 1 ( “ $PHI  NC 
•)TEMFl)J2)I3)A4)(4A(3(2$RETRIEVE  (l(#$MINUS  I°)31)2)63)4)5) INPU 
T6)7). 

$STM27=(’%  $ PHI ,NC, TEMP, J, I , A, OUTPUT, INPUT: ( •%  $SPRINT: ( 7 ( 6 ( s ( 4 ( 
3 ( 2 ( 1 ( #  $PHI  NC#)TEMP1)J2)I3)A4)OUTPUTs)INPUT#)(($CAT  $SPRINT)OUT 
PUT) 7 )•)’). 

$STM25=( 2%  $PHI : ( 1 $STM26 ( 1 $STM27  $PHI°)1)2). 

$STM24=  ( 1  2%  $PHI ,  NC ,  TEMP ,  J,  I ,  A,  OUTPUT ,  INPUT  :(ll(1,(,(,(7(*(s(4(3 
(2$IF(1 (#$LE  1° J91 )2)(1$STM25(°$STM24  $PHI 8 ) 1 ) 3 ) $PHI 4 )NCS ) TEMP6 ) 
J7 ) I* )A’ ) OUTPUT1 8 ) INPUT1 1 ) 1 2 ) . 

$STM1=( •%  $PHI :((((((( ( 7$STM2( ®$STM3( 5$STM7 ( 4$STM8( 3 $ STM9 ( 2 $ STM2 
3 ( 1 $STM24( *%  NC, TEMP, J, I , A, OUTPUT, INPUT : $PHI • ) 1 )2)3)4 )s)*)7)$OME 
GA ) $OMEGA ) $OMEGA ) $OMEGA ) ( 1 (1$TUPINIT  1°  J61 ) )$ OMEGA )$ OMEGA) * ) . 

$PROGRAM= ( ( $  STM1  $ID)$ID). 

1.13.  Procedures 

The  full  modelling  of  procedures  containing  different  kinds 
of  parameter  references  (call  by  name,  value,  reference),  global 
side  effects  and  possible  recursive  invocations  constitutes  a 
major  challenge  to  functional  semantics  [1,2],  Indeed,  the 
actual  implementations  are  fairly  involved.  At  this  state  of 
development,  only  procedures  with  parameters  passed  by  value  are 
accepted  by  the  code  generation  part  of  the  compiler,  but  side- 
effects  and  recursive  calls  are  permitted.  This  type  of 
procedure  will  henceforth  be  referred  to  as  a  "V-procedure". 

V-procedure  definitions  will  be  represented  by  names  for 
blocks,  and  their  parameters  will  be  initialized  dynamically 
with  the  argument  values  passed: 

{V-procedure  p(al: <typel>; ... ;  ak:<typek>);  <block>}/Eff 
p  =  (%  $VAL$al, . . . ,  $ VAL$ ak :  {<block> }/E) .  . 

Inside  the  block  representation  (see  section  1.6)  a  new  initial 
value  is  chosen  for  all  formal  parameters  ai : 

{ init  ai }  is  $VAL$ai  for  all  value  parameters  ai . 


In  the  case  that  the  environments  of  the  calling  statement 
and  the  V-procedure  definition  are  the  same,  the  representation 
of  the  call  is  simple: 
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{  p  ( <expr. 1>, . . . ,  <expr.k>)  }/E  = 

(%  $PHI ,  vl,...,  vn: (...(((...  (p  { <expr . 1> }/E) . . • 

(<expr.k>}/E)  $PHI )  vl)...  vn)). 

Should  the  environments  differ  (e.g.  if  the  V-procedure  is 
called  recursively),  all  additional  variables  of  the  calling 
environment  have  to  be  disposed  of  during  the  V-procedure 
execution  and  recovered  upon  return.  This  is  done  by  including 
them  into  the  continuation  of  the  calling  statement  and  re¬ 
establishing  them  when  this  continuation  is  accessed  by  the 
reduction  process.  Let  G=(ul,...,  urn,  vl,...,  vn)  be  the  calling 
and  E=(vl,...,  vn)  the  procedure  environments. 

{  p  ( <expr . 1>, . . . ,  <expr.k>)  }/G  = 

(%  $PHI,  ul,...,  um,  vl,...,  vn: (...(((... (p  { <expr. 1> }/E) . . . 
{<expr.k>}/E) ( . . . ( ($PHI  ul)  u2)...  um))  vl)...  vn) ) . 

This  representation  solves  the  so-called  environment  conflict 
problem  [ 2 ] . 

1.14.  Example#3 

The  following  example  shows  how  procedures  and  their  calls 
will  be  translated: 


Stmnr  Source  code: 


2 

2 

6 

7 

7 

8 
8 
8 

11 

11 

14 

15 

15 

16 


(*$U+,X-  superscripts,  no  cross  reference  *) 
PROGRAM  EXAMPLE3 (OUTPUT); 

VAR  I,  J:  INTEGER; 

PROCEDURE  PI (K,  L:  INTEGER); 

VAR  M:  INTEGER; 

BEGIN 

IF  0<>L 

THEN  BEGIN 

M:=K;  K:=L;  L:=M  MOD  L; 
P1(K,  L) 

END 

ELSE  OUTPUT® : =K 
END;  (*  PI  *) 

PROCEDURE  GCD ( I ,  J:  INTEGER); 

BEGIN  P1(I,  J);  END;  (*  GCD  *) 

BEGIN 

I :=28;  J:=7; 

GCD( I ,  J+14) ; 

PUT 

END. 
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*  LAMBDA  CODE  FOR  EXAMPLE3 
Pl=(9%  $VAL$L, $VAL$K: $STM1° ) . 

$STM4=(6%  $PHI ,  M,  K,  L,  J, I , OUTPUT: (5(4(3(2(x(9 $PHI  K° )Kl )L2 )J3 ) I4 ) 
OUTPUT* ) 6 ) . 

$STM5=(6%  $PHI,M,K,L, J, I, OUTPUT: (*(4 (3(2(l (9$PHI  M# )Ll )L2 )J3 ) I4 ) 
OUTPUT* ) 6 ) . 

$STM6=(6%  $PHI ,M,K,L,J/I/ OUTPUT :(5(4(3(2(1(° $PHI  M9 )Kl ) ( 1 ( 9 $MOD 
M" )LX )2)J3)I4 ) OUTPUT 5 ) 6 ) . 

$STM7=(7%  $PHI ,  M,  K,  L,  J,  I , OUTPUT: (6(*(4(3(l(BPl  K° )LX ) ( 2 ( 1 ( 9 $PHI 
M* )KX )L2 ) 3 ) J4 ) I  * ) OUTPUT6 ) 7 ) . 

$STM3=(4%  $PHI : ( 3$STM4( 2$STM5 ( 1 $STM6( #  $STM7  $PHI 9 ) 1 ) 2 ) 3 ) 4 ) . 

$STM8=(6%  $PHI,M,K,L,J< I, OUTPUT: ( * ( 4 ( 3 ( 2 ( 1 ( 9 $PHI  M9 )KX )L2 ) J3 ) I4 ) 
K*)6). 

$STM2=( 1 2%  $PHI ,  M,  K,  L,  J, I , OUTPUT :(ll(19(,(8(7(6(*(4(3(2$IF  (x(#$ 
NE  O'JL1 )2)$STM33)$STM84)$PHI*)M6)K7)L#)J,)I1#)OUTPUT11 )12) . 

$STM1=(2%  $PHI : ( ( ( ( 1  $ STM2 ( 9 %  M,  K,  L: $PHI 9 ) 1 ) $OMEGA) $VAL$K) $VAL$L) 

2). 

GCD=(°%  $VAL$J$2/$VAL$I$2:$STM99 ) . 

$STM10=(6%  $PHI,I$2, J$2, J$l, I$l, OUTPUT: (s(4(3(2(x (°P1  I$2#)J$21) 
(x (9$PHI  I$2 9 ) J$2 1 )2)J$13)I$14 ) OUTPUT* ) 6 ) . 

$STM11=$ID. 

$STM9“( 3%  $PHI: ( ( ( 2$STM10( 1 $STM11( °%  I$2 , J$2 : $PHI 8 ) 1 ) 2 ) $VAL$I$2 ) 
$VAL$ J$2 ) 3 ) . 

$STM13=(3%  $PHI , J , I , OUTPUT : ( 2 ( 1 ( 9  $PHI  J9 )28l )OUTPUT2 ) 3 ) . 

$STM14=(3%  $PHI,J, I, OUTPUT: (2(l (°$PHI  7 0 ) 1 1 ) OUTPUT2 ) 3 ) . 

$STM1S=(7%  $PHI ,  J ,  I , OUTPUT :(6(5(4(3(2(9 GCD  Xa)(*(a$?LU8  J#)14x)2 
)$PHI  3 ) J4 ) I  * ) OUTPUT6 ) 7 ) . 

$STM16=(*%  $PHI,J, I, OUTPUT: (4%  $SPRINT: ( 3 ( 2 ( 1 ( 8 $PHI  J9)Il)OUTPUT 
2 ) ( ( $CAT  $  SPRINT ) OUTPUT ) 3 ) 4 ) * ) . 

$STM12=( *%  $PHI : ( ( ( ( 4 $STM13 ( 3$STM14( 2$STM15 ( 1 $STM16 ( a%  J, I,OUTPU 
T : $PHI 9 ) 1 > 2 ) 3 ) 4 ) $OMEGA ) $ OMEGA ) $ OMEGA ) * ) . 


$PROGRAM=( ($STM12  $ID)$ID). 
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1.15.  Remarks  on  Further  Language  Constructs 

Until  now  all  representations  described  have  been  actually 
implemented  in  the  compiler.  Some  remarks  are  in  order  regarding 
how  some  PASCAL  features  for  which  no  lambda  code  is  currently 
generated  by  the  compiler  could  be  translated. 

Labels  can  be  viewed  as  names  for  continuations.  A  goto 
statement  then  merely  substitutes  the  representation  of  the 
referenced  label  for  the  current  program  remainder.  However,  it 
seems  a  very  tedious  task  to  determine  the  continuation  at  a 
given  point  of  a  program  at  compilation  time. 

Function  calls  are  similar  to  procedure  calls.  If  no  side 
effects  occur  their  represention  is  actually  very  simple  [2J. 
Otherwise  many  intermediate  results  have  to  be  introduced 
because  function  calls  can  be  made  repeatedly  within  a  single 
expression.  Their  representation  is  not  theoretically  difficult 
but  it  is  rather  hard  to  actually  implement  their  translation. 

The  modelling  of  procedure  and  function  parameters  as  well 
as  pointer  variables  seems  too  complicated  at  this  stage. 


1.16.  Example#4 

Part  1  is  concluded  with  a  "real"  PASCAL  program  example  to 
multiply  matrices: 


Stmnr 


Source  code: 


( *$U+,X-  superscripts,  no  cross  reference  *) 
PROGRAM  MATRIXMULT( INPUT,  OUTPUT); 

CONST  LB=5 ;  HB=10; 

TYPE  RANGE=LB. .HB; 

MATR I X= ARRAY [ RANGE ,  RANGE]  OF  INTEGER; 
VAR  A,  B,  C:  MATRIX; 

I,  J,  K:  INTEGER; 


2 

3 

3 

5 

6 
6 
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PROCEDURE  READWR I TE( SWITCH:  BOOLEAN;  C:  MATRIX); 
(*  Reads  in  A  and  B  (global)  or  prints  C  *) 

(*  according  to  the  logical  SWITCH  *) 

VAR  I,  J:  INTEGER; 

BEGIN 

I:=LB; 

WHILE  I <=HB  DO 
BEGIN 

J:=LB; 

WHILE  J<=HB  DO 
BEGIN 

IF  SWITCH 


8  THEN  BEGIN 

11  GET;  A[I,  J]:=INPUT@; 

13  GET;  B[I,  J]:=INPUT<§> 

13  END 

13  ELSE  BEGIN 

15  OUTPUT@ : — C [ I ] l J ] ;  PUT 

16  END; 

17  J:=J+1 

17  END; 

18  I:=I+1 

18  END 

18  END;  ( *READWRITE* ) 

18 

18  BEGIN  (*  of  main  program  *) 

20  RE ADWR I TE ( TRUE ,  C);  (*  C  is  just  dummy  *) 

21  I:=LB; 

22  WHILE  ( I <=HB)  DO 

22  BEGIN 

24  J:=LB; 

25  WHILE  ( J<=HB)  DO 

25  BEGIN 

27  C[I,  J]:=0; 

28  K:=LB; 

29  WHILE  (K<=HB)  DO 

29  BEGIN 

31  C[ I , J ] : =C[ I , J ] +A[ I , K] *B [ K, J ] ; 

32  K:=K+1 

32  END; 

33  J:=J+1 

33  END; 

34  I:*I+1 

34  END; 

35  READWRITE ( FALSE,  C) 

35  END. 

*  LAMBDA  CODE  FOR  MATRIXMULT 

READWRITE=( •%  $VAL$SWITCH, $VAL$C$2 : $STM1# ) . 

$STM2*( 1 2%  $PHI , J$2 , I$2 , C$2 , SWITCH, K, J$1,I$1/C$1/ B, A, OUTPUT, INPU 
t.(U(H(»(»(7(«(S(4(3(2(1(0  $phi  j$2  0  )  5 1  )  C$2  2  )  SWITCH3  )  K<  )  J$  1 5  )  I  $ 

l6  ^$1’ )B» )A’ ) OUTPUT 1 0 ) INPUT1 1 ) 1 2 ) . 

$STM5=(12%  $PHI , J$2 , I$2 , C$2 , SWITCH, K, J$1/I$1,C$1,B,A, OUTPUT, INPU 
T: ( 11 (10 («(« (7(«(5(4 (3(2(1 (0$PHI  5 0 ) I $2 1 ) C$2 2 ) SWITCH3 ) K4 ) J$ 1 s ) I $ 
1* )C$17 )B* )A* )OUTPUTl # ) INPUT1 1 ) 1 2 ) . 
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$STM10=(14%  $PHI ,  J$2 , I$2 , C$2 , SWITCH, K, J$1 , I$1 , C$1 , B, A,  OUTPUT, INP 
UT:(13%  $ SPRINT, $SCARDS :(12(ll(l9(,(8(7(®(*(4(3(2(1(9 $PHI  J$2°)I 
$2X )C$22) SWITCH3 )K4 ) J$l* ) I$l6 )C$17 )B9 ) A9 ) OUTPUT1 1 ) $ SCARDS 1 1 ) $ SPR 
INT1  2  ) 1  3 ) 1  4  )  . 

$STM11=( 1 2%  $PHI, J$2, I$2, C$2, SWITCH, K, J$l, I$1 , C$1 , B, A, OUTPUT, INP 
UT:(11(1#(,(*(7(6(5(4(3(2(1 ^  o$PHI  j$2  0 )I$2* )C$22 ) SWITCH3 )K4 ) J$ls 
)I$1*)C$17)B8)(7A(6(3(2$REPLACE  ( 1 ( 1 $MINUS  I$2 0 ) 41 ) 2 ) 63 ) ( 5 ( 4 A( 3 ( 
2$RETRIEVE  ( 1  ( 9  $MINUS  I$2 0  )  41  )  2  )  63  ) 4  )  (  4  (  3  (  2$REPLACE  OCSMINUS  J 
$2° )4l ) 2 ) 63 ) INPUT4 )5)6)7)9) OUTPUT 1 0 ) INPUT1 1 )12) . 

$STM12=(14%  $PHI, J$2, I$2,C$2, SWITCH, K,J$1, I$1 , C$1 , B, A, OUTPUT, INP 
UT:(13%  $  SPRINT,  $SCARDS  :(12(11(10(,(®(7(6(s(4(3(2(1(11  $PHI  J$2°)I 
S21 )C$22 ) SWITCH3 )K4 ) J$l* ) I$l6 )C$17 )B8 )A9  JOUTPUT1 0 )$SCARDSl 1 )$SPR 
INT12 ) 1 3)14 ) . 

$STM13=(12%  $PHI , J$2 , I$2, C$2, SWITCH, K,J$1, I$1 , C$1 , B, A, OUTPUT, INP 
UT;(ii(iO(9(8(7(6(5(4(3(2(i ( »$phI  J$2# ) I$2 1 )C$22 ) SWITCH3 )K4 ) J$ls 
)I$16)C$17)(7B(6(3(2$REPLACE  ( 1 ( 0  $MINUS  I$2 9 )4X ) 2 ) 63 ) ( 5 ( 4B( 3 ( 2$R 
ETRIEVE  ( 1 ( 0  $MINUS  I$2 0  ) 41  ) 2  ) 63 ) 4  ) (  4  (  3  (  2$REPLACE  (1(°$MINUS  J$2° 
)49 ) 2 ) 63 ) INPUT4 ) 5 )s) 7)8 )A9 ) OUTPUT 1 0 ) INPUT1 1 ) 1 2 ) . 

$STM9=(4%  $PHI : ( 3$STM10( 2$STM11 ( 1 $STM12 ( 0 $STM13  $PHI 8 ) 1 ) 2 ) 3 ) 4 ) . 

$STM15=(12%  $PHI, J$2, I$2, C$2, SWITCH, K,J$1, I$1 , C$1, B, A, OUTPUT, INP 
UT : (11(1#(,(8(7(s(5(4(3(2(1(°$PHI  J$2° )I$29 )C$22 ) SWITCH3 )K4 ) J$ls 
)I$16)C$17)B8)A9)(*(4C$2(3(2$RETRIEVE  ( 1 ( 9 $MINUS  I$29  ) 41  ) 2 ) 63 ) 4  ) 
(3(2$RETRIEVE  (‘("SMINUS  J$2° )4j )2)63)5)10 ) INPUT1 1 ) 1 2 ) . 

$STM16=(14%  $PHI , J$2 , I$2,C$2, SWITCH, K,J$1, I$l, C$1, B, A, OUTPUT, INP 
UT : ( 1 3%  $ SPRINT :(12(11(1B(,(8(7(6(5(4(3(2(1(°  $PHI  J$2° ) I$2X )C$22 
)SWITCH3)K4 ) J$l5 )I$16 )C$17 )B8 )A9 ) OUTPUT 1 0 ) INPUT1 1 ) ( ($CAT  $SPRINT 
) OUTPUT )12)13)14). 

$STM14=(2%  $PHI : ( 1 $STM15 ( 9  $STM16  SPHI9)1)2). 

$STM8= ( 1 6%  $PHI , J$2 , I $2 , C$2 , SWITCH ,K,J$1,I$1,C$1,B,A, OUTPUT , INPU 
T:(ls(14(l3(12(11(11(,(8(7(6(5(4(3(2(1(9$IF  SWITCH9 ) SSTM91 ) $STM1 
42 )$PHI3 ) J$24 ) I $2* ) C$2 6 ) SWITCH7 )K8 ) J$l» ) I$l9 9  JCSl1 1 JB1 2 JA1 3 )OUTP 
UT1 4 ) INPUT1 5 ) 1 6 ) . 

$STM17=( 1 4%  $PHI, J$2, I$2, C$2, SWITCH, K, J$l, I$l, C$1, B, A, OUTPUT, INP 
UT:(13(12(11(19(,(8(7(«(5(4 ( 3 ( 2$PHI  ( 1 ( °$PLUS  J$2°  Jl1 )Z)I$23)C$2 
4 ) SWITCH* )K* ) J$l7 ) I$l9 )C$1* )BX  9 )AX 1 JOUTPUT1 2 ) INPUT1 3 ) 1 4 ) . 
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$STM7=(2%  $PHI : ( 1 $STM8( 8 $STM17  $PHIJ)1)2)- 

$STM6=( 1 7%  $PHI, J$2, I$2,C$2, SWITCH, K,J$1, I$l, C$1, B, A, OUTPUT, INPU 

T;(l«(lS(14(13(12(Xt(H(9(8(7(S(S(-»(3(2$rF  (1  («$L£  J$2  0  )  10 1  )  2  )  (  1 

$STM7( #$STM6  $PHI° ) 1 ) 3)$PHI4 ) J$2S ) I$26 )C$27 ) SWITCH8 )K9 ) J$l' 0 ) I$1 
1 1 )C$1X  2 )Bl 3 JA1  4 ) OUTPUT 1 5 ) INPUT1 6 ) 1 7 ) . 

$STM18=( 1 3%  $PHI, J$2, I$2, C$2, SWITCH, K,J$1, I$l, C$1, B, A, OUTPUT, INP 
UT;(X2(11(1J(9(«(7(6(5(4(3(2(0$PHI  J$2 0 ) ( 1 ( 8 $PLUS  I $ 2 8 ) 1 1 ) 2 ) C$ 2 3 
) SWITCH4 )KS)J$1S)I$17)C$18 )B8)Al 8 ) OUTPUT1 1 ) INPUT1 2 ) 1 3 ) . 

$STM4=(3%  $PHI : ( 2  $  STM5 ( 1  $  STM6 ( 8  $  STM18  $PHI 8 ) 1 ) 2 ) 3 ) . 

$STM3=( 1 1%  $PHI, J$2, I$2, C$2, SWITCH, K, J$l, I$l, C$1, B, A, OUTPUT, INPU 
T;(lS(ls(i4(l3( 12(11(1 8 (9(9(7(6( 5(4 (3(2$ IF  (1(»$LE  I $ 2 8 ) 10 1 ) 2 ) ( 1 

$STM4( 8  $STM3  $PHI 8 ) 1 ) 3 ) $PHI4 ) J$2S ) I$26 )C$27 ) SWITCH8 JK’JJSI1 8 )I$1 
11 )C$112)B13)A14 ) OUTPUT15) INPUT16)17) . 

$STM1=( 3%  $PHI :((((( 2 $ STM2 ( 1 $STM3 ( °%  J$2 , I$2 , C$2 , SWITCH: $PHI 8 ) 1 ) 
2 ) $ OMEGA) $OMEGA) $VAL$C$2 ) $VAL$ SWITCH) 3 ) . 

$STM20=(1 !%  $PHI , K, J, I , C, B, A, OUTPUT, INPUT: (18(,(8(7(6(5(4(3(2(1( 
•READWRITE  $TRUE# JC1 )$PHI  2 )K3 ) J4 ) I  5 )C6 )B7 )A8 ) OUTPUT9 ) INPUT1 8 ) 1 1 
)• 

$STM21=(8%  $PHI,K,J, I, C,B,A, OUTPUT, INPUT: ( 7(6( s (4 ( 3(2 ( 1 ( 8$PHI  K8 
)Jl )52)C3)B4 ) A5 ) OUTPUT6 ) INPUT7 ) 8 ) . 

$STM24=(8%  $PHI ,K,J,I,C,B,A, OUTPUT, INPUT :(7(6(s(4(3(2(1(8  $PHI  K° 
)5l )I2)C3)B4 ) A5 )0UTPUT6 ) INPUT7 ) 8 ) . 

$STM27=( 1 3%  $PHI , K, J, I , C, B, A, OUTPUT, INPUT: ( 1 2 ( 1 1 ( 1 8 ( 9 ( 8 ( 2 ( 1 ( 8 $PH 
I  K8 ) J1 )I2) ( 7C( 6( 3(2$REPLACE  ( 1 ( 8  $MINUS  1 8 )4J ) 2 ) 63 ) ( 5 ( 4C( 3 ( 2$RET 
RIEVE  ( 1  ( 8  $MINUS  1°  )41  )2)63)4  )  (4  (3(2$REPLACE  (*(8$MINUS  J°)41)2) 
63)04 ) 5 )6 )7 )8 )B9 ) A1 8 ) OUTPUT 1 1 ) INPUT1 2 ) 1 3 ) . 

$STM28=(8%  $PHI ,K,J, I,C,B,A, OUTPUT, INPUT: (7(6(s(4(3(2(1(°  $PHI  58 
) J1 )I2 )C3)B4 ) A5 )OUTPUT6 ) INPUT7 )8 ) . 

$STM31=(18%  $PHI,K, J, I, C,B, A, OUTPUT, INPUT: (17(l«(li(14(»3(*(»(»$ 
PHI  K8  )  J1  )  I2  )  ( 1  2C( 1 1  (  3(  2$REPLACE  (j(8$MINUS  1 8  )  41  )  2  )  63  )  ( 1  8  (  4C(  3  ( 
2 $ RETRIEVE  ( 1 ( 8  $MINUS  1 8 ) 41 ) 2 ) 63 ) 4 ) ( 9 ( 3 ( 2$REPLACE  (1(°$MINUS  J° ) 
4l )2 )63) ( 8 ( s$PLUS  (s(4C(3(2$RETRIEVE  (‘(“SMINUS  1 8 ) 41 ) 2 ) 63 ) 4 ) ( 3 ( 
2$RETRIEVE  ( 1 ( 8  $MINUS  J# ) 41 ) 2 ) 63 ) 5 ) 6 ) ( 7 ( «$MULT  ( 5 ( 4 A( 3 ( 2$RETRIEV 
E  ( 1 ( 8 $MINUS  I°)41)2)63)4)( 3( 2$RETRIEVE  (1(°$MINUS  K° J41 ) 2 )63 ) 5 ) 
6)(*(4B(3(2$RETRIEVE  C1 (8$MINUS  K8 )4‘ )2)63)4 ) { 3 ( 2$RETRIEVE  (1(°$ 
MINUS  J#)41)2)63)s)7)8)9)18)11)12)13)B14)A15 ) OUTPUT1 6 ) INPUT1 7 ) 1 8 
)• 


$STM32=( 1  °%  $PHI ,  K ,  J , I ,  C ,  B ,  A , OUTPUT , INPUT :(’(8(7(6(s(4(3(2 $PHI  ( 
i ( 0  $PLUS  K1 )ll ) 2 ) J3) I4 )C5)B6)A7 )OUTPUT8 ) INPUT’ ) 1 0 ) . 

$STM30=(2%  $PHI:  ( 1 $STM31 ( 0 $STM32  $PHI0)1)2). 

$STM29=(13%  $PHI,K, J,I,C, B, A, OUTPUT, INPUT: ( 1 2 ( 1 1 ( 1 ' ( 9 ( 8 ( 7 ( 6 ( 5 ( 4 ( 
3 ( 2$IF  (1(°$LE  K° J101 )2 ) ( J$STM30( 8$STM29  $PHI 0 ) 1 ) 3 ) $PHI4 )K5 ) J® ) I 
7 )C8 )B’ JA1 0 ) OUTPUT 1 1 ) INPUT1 2 ) 1 3 ) . 

$STM33=(  ’%  $PHI,  K,J,  I  ,C,B,A,  OUTPUT,  INPUT:  (8(7(6(5(4(  3  (  2  ( 0  $PHI  K° 
)  ( 1  ( °$PLUS  J°  Jl1 )2 )I3)C4 )B5 )A6  )OUTPUT7  )  INPUT8  )’  )  . 

$ STM2 6= ( 4 %  $PHI: (3$STM27(2$STM28( 1$STM29( °$STM33  $PHI 0 ) 1 ) 2 ) 3 ) 4 ) . 

$STM25=(13%  $PHI,K, J,I,C,B,A, OUTPUT, INPUT: (12(11(1 8 ( 8 ( 8 ( 7 ( 6 ( s ( 9 ( 
3 ( 2$IF  ( 1 ( 0  $LE  J° )10a )2)(1$STM26(0$STM25  $PHI° ) 1 ) 3 )$PHI4 )K5 ) J6 ) I 
7 )C8 )B9 JA1 0 ) OUTPUT 1 1 ) INPUT1 2 ) 1 3 ) . 

$STM34=(8%  $PHI ,  K,  J, I,C,B,A, OUTPUT, INPUT :(7(6(5(4(3(2(1(°  $PHI  K# 
)J» ) ( 1 ( °$PLUS  1° )1J )2)C3)B4 )A5 )OUTPUT6 ) INPUT7 )8 ) . 

$STM23=(3%  $PHI: ( 2 $STM24( 1 $STM25 ( 0 $STM34  $PHI 0 ) 1 ) 2 ) 3 ) . 

$STM22=( 1 3%  $PHI ,  K,  J,  I ,  C, B, A, OUTPUT, INPUT: (12(11(18(,(8(7(6(5(4( 
3(2$IF  (1(#$LE  I°)101)2)(1 $STM23 ( 8  $STM22  $PHI 8 ) 1 ) 3 ) $PHI4 ) K5 ) Js ) I 
7)C8 )B’ ) A1 0 ) OUTPUT 1 1 ) INPUT1 2)13) . 

$STM35=(1 ("READWRITE  $ FALSE0 )Cl ) . 

$STM19=(5%  $PHI: ((((((((( 4 $STM20( 3$STM21 ( 2$STM22 ( 1 $STM35 ( °%  K,J, 
1,0,6, A, OUTPUT, INPUT: $PHI° ) 1 ) 2 ) 3 ) 4 ) $OMEGA ) $OMEGA ) $OMEGA ) ( 2 ( 1 ( °$T 
UPINIT  2°)61)62))(2(1(°  $TUPINIT  2 0  ) 61  ) 62  ) ) ( 2  (  1  (  0 $TUPINIT  2°)61)6 
2 ) ) $ OMEGA ) $ OMEGA) 5 ) . 

$PROGRAM=( ($STM19  $ID)$ID). 
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PART  2:  THE  COMPILER 


2.1.  Features  and  Organization 

The  compiler  itself  is  written  in  standard  PASCAL.  It  is  a 
one  pass  translator,  with  the  following  well-distinguished 
execution  phases: 

lexical  scanning  by  means  of  a  finite  state 
machine . 

attributed  LL(1)  parsing  for  syntactic  analysis 
and  semantic  activities  ,  including  type-checking 
procedures . 

generation  of  lambda-expressions,  employing  a 
garbage  collecting  system  for  character  strings  of 
dynamic  lengths . 

Due  to  the  size  and  sparseness  of  the  transition  tables  of 
both  the  finite  state  automaton  and  the  pushdown  machine 
correspondiong  to  the  LL(1)  grammar  (73  possible  stack  symbols 
and  50  input  tokens),  implicit  program  code  was  used  to  realize 
the  automata. 

The  compiler  generates  a  source  program  listing  which 
includes:  accumulated  statement  counts,  accumulated  semicolon 
counts,  block  levels,  depths  of  nested  loops,  and  of  compound 
and  case  statements.  The  compiler  can  also  produce  a  cross- 
reference  of  all  identifiers  with  respect  to  the  semicolon 
counts  of  their  occurrences  and  a  specification  of  their 
explicit  types.  Context-sensitive  error  messages  are  recorded  on 
a  temporary  file  which  is  finally  appended  to  the  source 
listings.  Currently,  three  compiler  options  are  supported  which 
may  be  specified  in  the  usual  way  within  comment  braces  [5]:  X±, 
S±,  and  U±.  X-  will  supress  the  printing  of  the  cross-reference, 
S+  will  extend  the  syntax  of  the  language  accepted  (see  formal 
parameters  and  function  declarations),  and  U+  will  cause  the 
compiler  to  attach  superscripts  to  paired  parenthesis  in  the 
code  generated.  The  default  values  of  these  options  are 
"(*$X+,S+,U-*)" . 

2.2.  The  Lexical  Scanner 


The  lexical  scanner  advances  through  the  input  stream  of 
characters  until  it  recognizes  a  new  token  which  it  passes  to 
the  parser  [7].  There  are  50  different  (parameterized)  tokens 
which  become  the  terminals  of  the  later  LL(1)  grammar.  Some  have 
an  associated  parameter  value.  This  value  does  not  influence  the 
parse  but  is  used  in  later  tasks.  From  a  theoretical  point  of 
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view,  each  token  and  its  parameter  value  are  obtained  by  a 
single  finite  automaton,  and  the  complete  lexical  scanner  is 
just  a  parallel  composition  of  these.  The  two  most  important 
automata  are  the  identifier  scanner  and  number  scanner. 

2.2.1.  Identifiers 

This  compiler  distinguishes  PASCAL  identifiers  by  their 
first  ten  characters,  which  are  entered  into  a  hashing  table  and 
eventually  padded  by  blanks.  The  hashing  function  is  the  sum  of 
the  numerical  codes  of  the  first,  second,  fourth  and  fifth 
character  modulo  a  prime  number  which  is  close  to  half  of  the 
size  of  the  whole  table.  Hashing  collisions  are  resolved  by  a 
chaining  algorithm  using  the  second  half  of  the  hashing  table  as 
overflow  area.  Tne  parameter  value  of  the  token  IDENTIFIER  is 
the  hashed  table  index  of  each  recognized  identifier.  If  there 
is  no  danger  of  ambiguities,  identifiers  and  their  corresponding 
hashing  table  indices  are  not  distinguished  any  further. 

Keywords  cannot  be  used  as  identifiers.  A  binary  search  is 
conducted  through  an  alphabetically  sorted  table  of  the  35 
standard  PASCAL  keywords,  and  all  but  four  (the  operators  AND, 
MOD,  DIV,  and  IN)  become  tokens  themselves. 

2.2.2.  Numbers 


The  compiler  contains  an  explicit  finite  automaton  to 
accept  numbers  [7).  In  the  following  transition  table  each 
output  symbol  (denoted  by  a  lower  case  letter)  corresponds  to  a 
certain  action  specified  below  the  table.  The  initial  state  is 
1,  and  the  final  ("accepting")  state  0: 


STATE  vs.  INPUT  CHARACTER 
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Actions: 

a)  Record  a  new  digit  in  the  integral  part  of  num¬ 
ber. 
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b)  Unsigned  integer  terminated. 

c)  Unsigned  real  without  fractional  part  encount¬ 
ered. 

d)  Process  fractional  and  exponential  part  in  un¬ 
signed  real  number. 

e)  If  current  character  =  ')'  then  unsigned  integer 
terminated  and  current  character  := 

If  current  character  =  ' . '  then  unsigned  integer 
terminated  and  current  character  :=  double  dot. 
Otherwise  proceed  like  f ) . 

f)  Error  in  real  constant:  Digit  expected  but  not 
found. 

Two  tokens,  viz.  UNSGINTEG  and  UNSGREAL,  correspond  to 
unsigned  integers  and  real  numbers,  resp.  Their  parameters 

contain  their  actual  numerical  value.  Signs  will  be 
distinguished  from  "adding  operators"  on  a  later  grammatical 
level . 


There  are  six  separate  tokens  for  the  various  PASCAL 
operators.  However,  some  may  also  serve  another  syntactic 
purpose.  E.g.  EQUALSYM  in  definitions  of  constants  and  types  or 
PLUSMINUS  in  signed  numbers. 


Token:  Meaning  of  parameter  values: 


NOTSYM 

PLUSMINUS 

ORSYM 

MULTOPER 

EQUALSYM 

RELOPER 


None . 

1:  '+’,  2: 

f  _  f 

None . 

1:  ' ,  2: 

V 

None . 

2:  '<>' ,  3: 

'  < 

3:  DIV,  4:  MOD,  5:  AND. 

4:  '>',  5:  '<=',  6:  ' >=' ,  7: 


IN. 


All  literals  are  collected  in  a  vector  of  characters,  which 
is  MAXSTRGL  long.  The  token  STRINGSYM  associates  the  entry  of  a 
certain  literal  by  its  parameter  field  in  the  following  fashion: 


Parameter  value  =  starting  index  *  MAXSTRGL  +  length. 


The  remaining  tokens  correspond  to  special  symbols  without 
parameter  values: 


LPARASYM :  ' ,  RPARASYM : 
SEMICSYM:  COMMASYM: 
COLONSYM:  BECOMES: 


)',  LBRACKSYM :  '[',  RBRACKSYM :  ']', 

PERIODSYM:  DOUBLEDOT: 

=',  POINTER:  PASCAL  pointer  symbol. 


Brackets  may  be  also  written  as  ' ( . '  and  ' .  ) ' .  Comments  are 
enclosed  by  braces  or  by  '(*'  and  The  pointer  symbol  of 
this  implemention  is  the  ampersand. 


Before  proceeding  with  a  compendious  description  of  the  at¬ 
tributed  LL(1)  translation,  the  underlying  context-free  grammar 
itself  shall  be  scrutinized.  It  consists  of  57  non- terminals,  50 
terminals  (namely  all  tokens  described  in  section  2.2)  and  135 
productions.  All  but  one  non-terminnal  yield  disjoint  selection 
sets  [7]  for  different  productions.  The  selection  sets  of  the 
productions 

(i)  <else  clause>  ::=  ELSESYM  <statement>. 

(ii)  <else  clause>  : :=  <empty>. 

are  {ELSESYM}  for  (i)  and  {ENDSYM,  SEMICSYM,  UNTILSYM,  ELSESYM} 
for  (ii).  This  is  a  consequence  of  the  well-known  ambiguity 

if  el  then  if  e2  then  SI  else  S2. 

By  definition  [5],  each  else  clause  is  paired  with  the  last 
unmatched  then  clause.  This  is  equivalent  to  removing  the 
ELSESYM  from  the  selection  set  of  (ii).  With  respect  to  this 
modification  the  grammar  becomes  LL(1)  [  7  ] .  + 

Now  the  complete  grammar  shall  be  given  in  BNF  notation.  In 
addition,  the  selection  set  of  each  production  will  be  specified 
unless  its  right-hand  side  starts  with  a  terminal.  (In  this 
case,  the  terminal  is  the  only  element  of  its  selection  set. ) 
The  starting  symbol  is  <program>: 

(1)  <identifierlist>  =  COMMASYM  IDENTIFIER  cidentif ierlist> . 

(2)  <identifierlist>  : :=  <empty>. 

Selset(2)  =  {RPARASYM,  SEMICSYM,  C0L0NSYM} . 

(3)  <labeldclremainder>  : :  =  COMMASYM  UNSGINTEG  <labeldclre- 

mainder> . 

(4)  <labeldclremainder>  : : =  SEMICSYM. 

(5)  <labeldeclaration>  : LABELSYM  UNSGINTEG  <labeldclre- 

mainder> . 

(6)  <labeldeclaration>  : :=  <empty>. 

Selset(6)  =  (CONSTSYM,  TYPES YM,  VARSYM,  PROCSYM,  FUNCSYM, 
BEG I NS YM} . 

(7)  <nonidentconstrem>  : IDENTIFIER. 


+  It  is  not  known  to  us  whether  there  exists  a  ’’pure"  LL(1) 
grammar  for  standard  PASCAL.  E.g.  ALGOL  60  is  known  to  be 
"inherently  non-LL(l)"  [6]. 
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(8)  <nonidentconstrem>  ::=  UNSGINTEG. 

(9)  <nonidentconstrem>  ::=  UNSGREAL. 

(10)  <nonidentconstant>  : :=  PLUSMINUS  <nonidentconstrem> . 

(11)  <nonidentconstant>  ::=  UNSGINTEG. 

(12)  <nonidentconstant>  ::=  UNSGREAL. 

(13)  <nonidentconstant>  : :=  STRINGSYM. 

(14)  <constant>  : :=  IDENTIFIER. 

(15)  <constant>  : :=  <nonidentconstant> . 

Selset ( 15 )  =  (UNSGINTEG,  PLUSMINUS,  UNSGREAL,  STRINGSYM}. 

(16)  <constantlist>  ::=  COMMASYM  <constant>  <constantlist> . 

(17)  <constantlist>  : :=  <empty>. 

Selset ( 17 )  =  { COLONSYM } . 

(18)  <constdefinpartrem>  : :=  IDENTIFIER  EQUALSYM  <constant> 

SEMICSYM  <constdef inpartrem> . 

(19)  <constdefinpartrem>  ::=  <empty>. 

Selset( 19)  =  (TYPESYM,  VARSYM,  PROCSYM,  FUNCSYM,  BEGINSYM} . 

(20)  <constantdef inpart >  ::=  CONSTSYM  IDENTIFIER  EQUALSYM  <cons- 

tant>  SEMICSYM  <constdefinpartrem>. 

(21)  <constantdefinpart>  ::=  <empty>. 

Selset (21 )  =  (TYPESYM,  VARSYM,  PROCSYM,  FUNCSYM,  BEGINSYM}. 

(22)  <simpletyperemaind>  ::  =  DOUBLEDOT  <constant>. 

(23)  <simpletyperemaind>  : :=  <empty>. 

Selset (23 )  =  (RPARASYM,  SEMICSYM,  COMMASYM,  RBRACKSYM , 
ENDSYM} . 

(24)  <simpletype>  : :=  LPARASYM  IDENTIFIER  <identifierlist> 

RPARASYM. 

(25)  <simpletype>  ::=  IDENTIFIER  <simpletyperemaind> . 

(26)  <simpletype>  <nonidentconstant>  DOUBLEDOT  <constant>. 

Selset(26)  =  (UNSGINTEG,  PLUSMINUS,  UNSGREAL,  STRINGSYM}. 


(27)  <simpletypelist> 


COMMASYM  <simpletype>  <simpletype- 
list>. 


(28)  <simpletypelist>  ::=  <empty>. 

Selset(28)  =  { RBRACKSYM } . 

(29)  <variant>  : :=  <constant>  <constantlist>  COLONSYM  LPARASYM 

<fieldlist>  RPARASYM. 

Selset(29)  =  {IDENTIFIER,  UNSGINTEG,  PLUSMINUS,  UNSGREAL, 
STRINGSYM} . 

(30)  <variant>  : :=  <empty>. 

Selset(30)  =  {RPARASYM,  SEMICSYM,  ENDSYM } . 

(31)  <variantlist>  : :=  SEMICSYM  <variant>  <variantlist> . 

(32)  <variantlist>  : :=  <empty>. 

Selset(32)  =  {RPARASYM,  ENDSYM}. 

(33)  <tagfieldremainder>  : :=  COLONSYM  IDENTIFIER. 

(34)  <tagfieldremainder>  ::=  <empty>. 

Selset(34)  =  {OFSYMj . 

(35)  <fieldlistremaind>  :  :  =  SEMICSYM  <fieldlist>. 

(36)  <f ieldlistremaind>  : :=  <empty>. 

Selset( 36)  =  {RPAASYM,  ENDSYM}. 

(37)  <recordsection>  : :=  IDENTIFIER  <identifierlist>  COLONSYM 

<type>. 

(38)  <recordsection>  : :=  <empty>. 

Selset(38)  =  {RPARASYM,  SEMICSYM,  ENDSYM}. 

(39)  <fieldlist>  : :=  <recordsection>  <fieldlistremaind> . 
Selset(39)  =  {IDENTIFIER,  RPARASYM,  SEMICSYM,  ENDSYM}. 

(40)  <fieldlist>  : :=  CASESYM  IDENTIFIER  <tagf ieldremainder>  OF- 

SYM  <variant>  <variantlist> . 

(41)  <unpackstructtype>  ::=  ARRAYSYM  LBRACKSYM  <simpletype> 

<simpletypelist>  RBRACKSYM  OFSYM 
<type> . 

(42)  <unpackstructtype>  ::=  RECORDSYM  <fieldlist>  ENDSYM. 

(43)  <unpackstructtype>  ::=  FILESYM  OFSYM  <type>. 

(44)  <unpackstructtype>  : :=  SETSYM  OFSYM  <simpletype> . 


(45)  <type>  ::=  <simpletype> . 

Selset(45 )  =  (IDENTIFIER,  LPARASYM,  UNSGINTEG,  PLUSMINUS, 
UNSGREAL,  STRINGSYM} . 

(46)  <type>  ::=  PACKEDSYM  <unpackstructtype> . 

(47)  <type>  ::=  <unpackstructtype> . 

Selset(47)  =  ( ARRAY SYM,  RECORDSYM,  FILESYM,  SETSYM} . 

(48)  <type>  : :=  POINTER  IDENTIFIER. 

(49)  <typedef inpart rem>  ::  =  IDENTIFIER  EQUALSYM  <type>  SEMICSYM 

<typedef inpartrem> . 

(50)  <typedefinpartrem>  ::=  <empty>. 

Selset(50)  =  (VARSYM,  PROCSYM,  FUNCSYM,  BEGINSYM} . 

(51)  <typedefinitionprt>  :  :  =  TYPESYM  IDENTIFIER  EQUALSYM  <type> 

SEMICSYM  <typedefinpartrem>. 

(52)  <typedefinitionprt>  : :=  <empty>. 

Selset(52)  =  (VARSYM,  PROCSYM,  FUNCSYM,  BEGINSYM}. 

(53)  <variabledclprtrem>  : :=  IDENTIFIER  <identif ierlist>  COLON- 

SYM  <type>  SEMICSYM  <variabledcl- 
prtrem> . 

(54)  <variabledclprtrem>  : :=  <empty>. 

Selset(54)  =  (PROCSYM,  FUNCSYM,  BEGINSYM}. 

(55)  <variabledeclarprt>  : VARSYM  IDENTIFIER  <identifierlist> 

COLONSYM  <type>  SEMICSYM  <variable- 
dclprtrem> . 

(56)  <variabledeclarprt>  :  :=  <empty>. 

Selset(56)  =  (PROCSYM,  FUNCSYM,  BEGINSYM}. 

(57)  <formalparameter>  : :=  IDENTIFIER  <identif ierlist>  COLONSYM 

IDENTIFIER. 

(58)  <formalparameter>  : :=  VARSYM  IDENTIFIER  <identif ierlist> 

COLONSYM  IDENTIFIER. 

If  the  compiler  option  S+  is  activated,  an  explicit  <type> 
will  be  accepted  in  formal  parameters  (productions  57  and  58). 
Any  implicitely  defined  scalar  identifiers  within  this  <type> 
are  then  global  to  the  scope  of  the  procedure  or  function  body. 
No  pointer  references  are  forwarded. 
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(59)  <formalparameter>  : :=  FUNCSYM  IDENTIFIER  <identifierlist> 

COLONS YM  IDENTIFIER. 

(60)  <formalparameter>  : :=  PROCSYM  IDENTIFIER  <identifierlist>. 

(61)  <formparameterlist>  ::=  SEMICSYM  <formalparameter>  <form- 

parameterlist> . 

(62)  <formparameterlist>  : :=  <empty>. 

Selset(62)  =  { RPARASYM ] . 

(63)  <formparameterpart>  ::=  LPARASYM  <formalparameter>  <form- 

parameterlist>  RPARASYM. 

(64)  <formparameterpart>  : :=  <empty>. 

Selset(64)  =  {SEMICSYM,  COLONSYM} . 

(65)  <procfuncdeclarat>  ::=  PROCSYM  IDENTIFIER  <formparameter- 

part>  SEMICSYM  <block>. 

(66)  <procfuncdeclarat>  : : =  FUNCSYM  IDENTIFIER  <formparameter- 

part>  COLONSYM  IDENTIFIER  SEMICSYM 
<block>. 

If  the  compiler  option  S+  is  activated,  an  explicit  <type> 
will  be  accepted  in  the  function  return  type.  Any  implicitely 
defined  scalar  identifiers  within  this  <type>  are  then  global  to 
the  scope  of  the  function  body.  No  pointer  references  are 
forwarded . 

(67)  <procfuncdclpart>  : :=  <procfuncdeclarat>  SEMICSYM  <proc- 

funcdclpart>. 

Selset ( 67 )  =  {PROCSYM,  FUNCSYM}. 

(68)  <procfuncdclpart>  : :=  <empty>. 

Selset ( 68)  =  {BEGINSYM} . 

(69)  <expressionlist>  : : =  COMMASYM  <expression>. 

(70)  <expressionlist>  : :=  <empty>. 

Selset( 70)  =  {RPARASYM,  RBRACKSYM} . 

(71)  <variableselector>  ::=  LBRACKSYM  <expression>  <expression- 

list>  RBRACKSYM  <variableselector> . 

(72)  <variableselector>  : :=  PERIODSYM  IDENTIFIER  <variablese- 

lector>. 

(73)  <variableselector>  : :=  POINTER  <variableselector> . 
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(74)  <variableselector>  ::==  <empty>. 

Selset(74)  =  [RPARASYM,  SEMICSYM,  COMMASYM,  EQUALSYM, 

RBRACKSYM ,  DOUBLEDOT,  BECOMES,  DOSYM,  OFSYM, 
ORSYM,  TOSYM,  ENDSYM,  ELSESYM,  THENSYM, 
UNTILSYM,  DOWNTOSYM,  PLUSMINUS,  RELOPER, 
MULTOPER} . 

(75)  <actparameterpart>  : :=  LPARASYM  <expression>  <expression- 

list>  RPARASYM. 

(76)  <identifierremaind>  ::=  <variableselector> . 

Selset(76)  ={PERIODSYM,  RPARASYM,  SEMICSYM,  COMMASYM, 

EQUALSYM,  LBRACKSYM ,  RBRACKSYM,  DOUBLEDOT,  DO¬ 
SYM,  OFSYM,  ORSYM,  TOSYM,  ENDSYM,  ELSESYM, 
THENSYM,  UNTILSYM,  DOWNTOSYM,  POINTER,  PLUS- 
MINUS,  RELOPER,  MULTOPER}. 

(77)  <identifierremaind>  : :=  <actparameterpart> . 

Selset( 77 )  =  [LPARASYM}. 

(78)  <setelementremaind>  : :=  DOUBLEDOT  <expression> . 

(79)  <setelementremaind>  : :=  <empty>. 

Selset(79)  =  {COMMASYM,  RBRACKSYM}. 

(80)  <setelement>  <expression>  <setelementremaind> . 

Selset(80)  =  {LPARASYM,  NOTSYM,  STRINGSYM,  PLUSMINUS, 

IDENTIFIER,  UNSGINTEG} . 

(81)  <setelementlist>  : :=  COMMASYM  <setelement>  <setelement- 

list> . 

(82)  <setelementlist>  : :=  <empty>. 


Selset:(82 ) 

s 

{ RBRACKSYM } . 

(83) 

<setrange> 

*  • 

=  <setelement>  <setelementlist>. 

Selset(83) 

{LPARASYM,  NOTSYM,  STRINGSYM,  PLUSMINUS 
IDENTIFIER,  UNSGINTEG}. 

(84) 

<setrange> 

;  ; 

=  < empty >. 

Selset ( 84) 

S 

{RBRACKSYM} . 

(85) 

<£actor>  : 

;  = 

NOTSYM  <factor>. 

(86) 

<factor>  : 

:  = 

IDENTIFIER  <identif ieremaind> . 

(87) 

<factor>  : 

;  = 

STRINGSYM. 

(88) 

<factor>  ; 

•  s 

UNSGINTEG. 

(89) 

<factor>  s 

UNSGREAL. 

(90)  <factor>  ::=NILSYM. 

(91)  <factor>  ::=  LBRACKSYM  <setrange>  RBRACKSYM. 

(92)  <factor>  ::=  LPARASYM  <expression>  RPARASYM. 

(93)  <factorlist>  ::=  MULTOPER  <factor>  <f actorlist> . 

(94)  <factorlist>  : :=  <empty>. 

Selset(94)  =  {RPARASYM,  SEMICSYM,  COMMAS YM,  EQUALSYM, 

RBRACKSYM,  DOUBLEDOT,  DOSYM,  OFSYM,  ORSYM, 
TOSYM,  ENDSYM,  ELSESYM,  THENSYM,  UNTILSYM, 
DOWNTOSYM,  PLUSMINUS,  RELOPER} . 

(95)  <term>  ::=  <factor>  <factorlist> . 

Selset(95 )  =  {LPARASYM,  LBRACKSYM,  NILSYM,  NOTSYM,  STRING- 
SYM,  IDENTIFIER,  UNSGREAL,  UNSGINTEGj . 

(96)  <termlist>  ::=  PLUSMINUS  <term>  <termlist>. 

(97)  <termlist>  : :=  ORSYM  <term>  <termlist>. 

(98)  <termlist>  : : =  <empty>. 

Selset(98)  =  {RPARASYM,  SEMICSYM,  COMMASYM,  EQUALSYM, 

RBRACKSYM,  DOUBLEDOT,  DOSYM,  OFSYM,  TOSYM, 
ENDSYM,  ELSESYM,  THENSYM,  UNTILSYM,  DOWNTO¬ 
SYM,  RELOPER} . 

(99)  <simpleexpression>  : :=  PLUSMINUS  <term>. 

(10 0)  <simpleexpression>  : :=  <term>  <termllst>. 

Selset ( 100)  =  {LPARASYM,  LBRACKSYM,  NILSYM,  NOTSYM, 

STRINGSYM,  IDENTIFIER,  UNSGREAL,  UNSG- 
INTEG} . 

(101)  <simpleexpressrem>  .-:  =  EQUALSYM  <simpleexpression> . 

(102)  <simpleexpressrem>  : :=  RELOPER  <simpleexpression> . 

(103)  <simpleexpressrem>  ::=  <empty>. 

Selset ( 103 )  =  {RPARASYM,  SEMICSYM,  COMMASYM,  RBRACKSYM, 
DOUBLEDOT,  DOSYM,  OFSYM,  TOSYM,  ENDSYM, 
ELSESYM,  THENSYM,  UNTILSYM,  DOWNTOSYM}. 

(104)  <expression>  ::=  <simpleexpression>  <simpleexpressrem> . 
Selset ( 104)  =  {LPARASYM,  LBRACKSYM,  NILSYM,  NOTSYM, 

STRINGSYM,  PLUSMINUS,  IDENTIFIER,  UNSGREAL, 
UNSGINTEGJ . 
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(105)  <simplestatemntrem>  : :=  <variableselector>  BECOMES  <ex- 

pression> . 

Selset( 105)  =  {PERIODSYM,  LBRACKSYM,  POINTER,  BECOMES}. 

(106)  <simplestatemntrem>  : :=  <actparameterpart>. 

Selset( 106)  =  {LPARASYM} . 

(107)  <simplestatemntrem>  ::=  <empty>. 

Selset( 107 )  =  {SEMICSYM,  ENDSYM,  UNTILSYM,  ELSESYM} . 

(108)  <compoundstmntrem>  : :=  SEMICSYM  <statement>  <compound- 

stmntrem> . 

(109)  <compoundstmntrem>  : :=  ENDSYM. 

(110)  <elseclause>  :••=  ELSESYM  <statement>. 

(111)  <elseclause>  ::=  <empty>. 

Selset ( 111 )  =  {SEMICSYM,  ENDSYM,  UNTILSYM}. 

(112)  <caseelement>  : :=  <constant>  <constantlist>  COLONSYM 

< statement > . 

Selset ( 112 )  =  {IDENTIFIER,  UNSGINTEG,  PLUSMINUS,  UNSGREAL 
STRINGSYM} . 

(113)  <caseelement>  : :=  <empty>. 

Selset( 113 )  =  {SEMICSYM,  ENDSYM}. 

(114)  <caseelementlist>  : :=  SEMICSYM  <caseelement>  <caseelement 

list> . 

(115)  <caseelementlist>  : :=  ENDSYM. 

(116)  <repeatstatemntlst>  : : =  SEMICSYM  <statement>  <repeatstmnt 

list> . 

(117)  <repeatstatemntlst>  : :=  UNTIL. 

(118)  <forstatementrem>  ::=  TOSYM  <expression>  DOSYM  <state- 

ment> . 

(119)  <forstatementrem>  ::=  DOWNTOSYM  <expression>  DOSYM  <state 

ment> . 

(120)  <withvariablelist>  ::=  COMMASYM  IDENTIFIER  <variablese- 

lector>  <withvariablelist> . 

(121)  <withvariablelist>  : :=  DOSYM. 

(122)  <unlabeled8tatemnt>  : :=  IDENTIFIER  <simplestatemntrem> . 


(123)  <unlabeledstatemnt>  ::=  BEGINSYM  <statement>  <compound- 

stmntrem> . 

(124)  <unlabeledstatemnt>  ::=  IFSYM  <expression>  THENSYM  <state- 

ment>  <elseclause> . 

(125)  <unlabeledstatemnt>  : :=  CASESYM  <expression>  OFSYM  <case- 

element>  <caseelementlist> . 

(126)  <unlabeledstatemnt>  : :=  WHILESYM  <expression>  DOSYM 

<statement> 

(127)  <unlabeledstatemnt>  : :=  REPEATSYM  <statement>  <repeat- 

statemntlst>  <expression> . 

(128)  <unlabeledstatemnt>  : :=  FORSYM  IDENTIFIER  BECOMES  <expres- 

sion>  <forstatementrem> . 

(129)  <unlabeledstatemnt>  : :=  WITHSYM  IDENTIFIER  <variableselec- 

tor>  <withvariablelist>  <state- 
ment> . 

(130)  <unlabeledstatemnt>  : :=  GOTOSYM  UNSGINTEG. 

(131)  <unlabeledstatemnt>  : :=  <empty>. 

Selset ( 131 )  =  {SEMICSYM,  ENDSYM,  ELSESYM,  UNTILSYM}. 

(132)  < statement >  : :=  UNSGINTEG  COLONSYM  <unlabeledstatemnt> . 

(133)  <statement>  ::=  <unlabeledstatemnt> . 

Selset (133)  =  {IDENTIFIER,  SEMICSYM,  ENDSYM,  CASESYM,  BE¬ 
GINSYM,  IFSYM,  WHILESYM,  REPEATSYM,  FORSYM, 
WITHSYM,  GOTOSYM,  ELSESYM,  UNTILSYM}. 

(134)  <block>  ::=  <labeldeclaration>  <constdefinpart>  <type- 

definitionprt>  <variabledeclarprt>  <procfunc- 
declarat>  BEGINSYM  <statement>  <compoundstmnt- 
rem> . 

Selset ( 134)  =  {LABELSYM,  CONSTSYM,  TYPESYM,  VARSYM,  PROC- 
SYM,  FUNCSYM,  BEGINSYM). 

(135)  <program>  : :=  PROGRAMSYM  IDENTIFIER  LPARASYM  IDENTIFIER 

<identif ierlist>  RPARASYM  SEMICSYM  <block> 
PERIODSYM. 

The  transition  table  of  the  corresponding  one- state 
pushdown  automaton  is  about  33  print  pages  long.  This  table  has 
been  produced  by  a  program  which  inspects  given  context-free 
grammars  for  being  LL(1). 


2.4.  An  Attributed  Translation  of  Lists 

The  syntax  of  PASCAL  is  rich  in  lists  of  certain  entities 
(e.g.  identifier  list,  constant  list,  simple  type  list, 
expression  list,  simple  type  list,  formal  parameter  list, 
variant  list,  etc.).  One  general  scheme  applies  throughout  the 
tranlation:  Let  us  assume  first  that  the  non- terminals  <entity>, 
<list>  and  <item>  have  the  following  synthesized  and  inherited 
attributes  [ 7 ] : 


DESCR .  a  data  structure  containing  the  description  of  any 

item  (synth.). 

FIRST,  SEC...  same  as  DESCR  (inher.). 

HEAD .  The  head  pointer  to  a  list  of  all  item  DESCRs 

( synth . ) . 

CAR,  CDR .  pointers  to  lists  of  item  DESCRs  (synth.). 


A  suitable  translation  grammar  [7]  to  build  a  list  of  item 
descriptors  follows.  Action  symbols  will  be  surrounded  by 
dashes  .- 

(i)  <entity> (HEAD)  : :=  <item>(FIRST)  <list>(FIRST,HEAD) . 

(ii)  <list>(FIRST,CAR)  ::=  <separator>  <item>(SEC) 

<list>( SEC, CDR)  -action-. 

(iii)  <list>( FIRST, CAR)  ::=  <terminator>  -CDR:=nil-  -action-. 
Where  -action-  means: 

-allocate  CAR  and  put  FIRST  into  it;  catenate  CDR  to  it-. 

This  particular  grammar  yields  a  simple  method  for 
recovering  from  a  syntactic  error:  Suppose  <list>  is  called  but 
neither  a  <separator>  nor  a  <terminator>  appear  as  input  tokens. 
Then  CAR  can  be  set  to  nil  and  the  error  handler  may  advance  the 
input  stream  to  a  global  synchronization  symbol  such  as  a 
SEMICSYM  (see  also  section  2.8). 

2.5.  The  Main  Data  Structures  Needed  in  Translation 

Each  block  of  the  source  program  may  introduce  a  new  set  of 
identifiers  which  must  be  disregarded  when  the  parsing  of  this 
block  is  completed.  A  record  describing  each  new  identifier  will 
contain  the  block  level  number  of  this  identifier.  It  is  pushed 
onto  a  stack  of  identifier  descriptors  as  soon  as  the  referred 
identifier  is  sufficiently  defined.  On  leaving  a  block,  all 
identifier  descriptors  of  its  level  are  popped  off  this  stack. 
In  order  to  find  an  identifier  descriptor  within  this  stack,  its 
position  is  entered  into  the  identifier's  hashing  table  element. 
Should  there  already  be  an  address  of  an  identifier  defined  at  a 
lower  block  level,  a  stacking  mechanism  is  invoked.  It  should  be 
noticed  that  this  algorithm  reduces  the  search  time  for  an 
identifier  to  a  look-up  in  the  hashing  table.  Predefined  ident- 
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ifiers  are  pushed  onto  this  stack  at  the  very  beginning  of  the 
parse  and  possess  the  level  number  zero. 

Six  classes  of  identifiers  are  distinguished.  Identifiers 
may  denote  scalars,  types,  variables,  record  fields,  procedures, 
or  functions.  An  identifier  description  record  contains  the 
following  fields  (some  will  be  pointers  to  a  record  describing  a 
certain  type  which  will  be  discussed  below) : 


IDNR .  the  hashing  index  of  the  referenced  identifier. 

LEVNR .  the  block  level  of  its  definition. 


IDCLASS...  the  class  it  belongs  to. 

Depending  on  the  value  of  the  field  IDCLASS,  the  descriptor 
recordcontains  the  following  additional  fields: 

For  a  scalar  identifier: 

STYP. . . .  a  pointer  to  its  (scalar)  type  descriptor. 

VALUE. . .  its  cardinality. 

For  a  type  identifier: 

TYP...  a  pointer  to  the  type  it  denotes. 

For  a  variable: 

TYP....  a  pointer  to  its  type  descriptor. 

PARM. . .  a  flag  signalling  whether  it  is  a  formal  parameter 
and  if  so  what  kind  of  parameter  it  is  (variable  or 
value) . 

For  a  record  field: 

TYP. . .  a  pointer  to  its  type  record. 

For  a  procedure  or  function: 

ARGTYPS....  a  list  of  type  records  for  its  parameters. 

SWFORW .  a  flag  signalling  whether  forward  declared. 

FORWARGS...  a  list  of  identifier  descriptors  for  all  formal 
parameters  in  case  of  forward  declaration.  These 


records  will  be  pushed  onto  the  stack  as  soon  as 
the  procedure  or  function  body  is  specified. 
RETTYP .  the  function's  return  t^pe. 

PARM .  a  flag  similar  to  the  PARM  field  in  a  variable. 


There  are  five  types,  namely  scalar,  array,  record  and 
pointer  types,  and  the  undefined  type.*  Each  type  record 
contains  a  switch  telling  which  class  the  type  belongs  to  and 
the  following  corresponding  fields: 


+  The  compiler  does  not  yet  support  set  and  file  types.  They  are 
treated  as  undefined  types. 
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For  a  scalar  type: 


SCIDNR .  its  type  identifier's  hashing  index. 

SCLEVNR .  the  level  of  its  definition. 


LOWBND,  HIGBND. . .  its  range  of  cardinalities. 

No  distinction  is  made  between  subrange  types  and  their 
matching  scalar  types  (encompassing  the  full  range).  The  type 
INTEGER  at  level  zero  ranges  over  -MAXINT  . .  MAXINT.  The  type 
REAL  at  level  zero  has  no  associated  bounds.  If  no  explicit 
type  identifier  is  given,  a  unique  identifier  "$IMPLTn" 
(where  n  is  a  certain  number)  will  be  used  instead.  The 
compiler  internally  translates  a  label  N  into  a  scalar  ident¬ 
ifier  ' $LABELN'  and  gives  it  the  scalar  type  'LABEL'  of  level 
zero  ranging  from  1  to  9999. 

For  an  array  type: 

INDXTYP...  the  (scalar)  type  record  of  the  array  index. 
COMPTYP...  the  type  record  of  the  array  component. 

SWPACK. ...  a  flag  signalling  whether  the  array  is  packed. 

Matrices  and  multidimensional  arrays  will  always  obtain  an 
array  type  as  their  component  types.  E.g.  array  (itl,  it2]  of 
t  will  become  array  (itl]  of  array  [it2]  of  t. 

For  a  record  type: 

SECTNS. . .  a  list  of  field  identifier  descriptors. 

SWPACK. . .  a  flag  signalling  whether  the  record  is  packed. 

For  a  pointer  type: 

PTRIDNR. . . .  referenced  type  identifier. 

REFERTYP. . .  referenced  type  record. 

This  compiler  treats  all  pointer  type  references  as 
forward  defined.  A  list  of  all  unresolved  pointer  type 
records  is  kept  until  all  type  definitions  are  parsed.  Then 
the  appropriate  type  record  pointers  corresponding  to  PTRIDNR 
are  assigned  to  REFERTYP. + 

For  of  the  undefined  type  class: 

UNDFIDNR. . .  the  undefined  type  identifier  (if  known). 

From  a  theoretical  point  of  view  all  compiler  routines 
taking  type  descriptors  as  arguments  (e.g.  the  type  checking 
facilities)  are  partial  functions  in  the  sense  that  a  single 
argument  of  the  undefined  type  forces  all  results  to  be  of 
this  type  also. 

In  order  to  reduce  the  number  of  attributes  attached  to 
symbols  of  the  grammar  involving  the  parsing  of  expressions,  a 
global  stack  is  constructed  to  serve  the  following  purpose:  Each 


*  We  believe  that  this  is  a  natural  resolution  of  the  following 
ambiguity  in  PASCAL:  Given  type  R  =  record  F: . . . ;  G:  @R  end, 
then  G  should  refer  to  R  itself  rather  than  a  different  type  R 
defined  on  a  lower  block  level . 
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stack  element  contains  a  flag  which  is  set  to  false  unless  the 
expression  currently  derived  is  a  single  variable.  Immediatly 
after  the  parse  of  an  expression  is  completed  the  top  element  of 
this  stack  may  be  saved  for  later  inspection.  Thus  non-variable 
arguments  are  discovered  in  place  of  variable  parameters.  The 
compiler  actually  uses  a  doubly  linked  list  in  order  to  re-use 
popped  off  stack  elements. 

2.6.  Strings  of  Fluid  Length 

This  compiler  utilizes  a  garbage  collection  system  for 
character  strings  of  fluid  lengths.  By  this  we  mean  that 
whenever  during  manipulation  a  string  would  become  longer  than 
the  space  reserved  for  it,  it  is  re-allocated  and  its  old 
position  released  for  garbage  collection. 

The  compiler  builds  the  object  code  by  a  recursive  process 
very  similar  to  the  representation  rules  of  part  one.  Thus  the 
final  length  of  a  string  of  lambda-calculus  code  can  by  no  means 
be  estimated  beforehand.  The  string  management  subsystem  is 
considered  to  be  an  essential  tool  for  successful  code 
generation.  It  resides  completely  independently  of  the 
compilation  routines,  and  it  can  be  used  whenever  it  is  required 
to  work  with  strings  whose  lengths  cannot  be  determined  prior  to 
their  actual  use. 

The  contents  of  all  strings  in  use  are  put  into  a  common 
area  of  core  --  for  instance  a  very  long  string  itself.  Then 
every  string  may  be  referred  to  by  a  record  containing  its 
starting  address  in  the  string  workspace,  its  current  length, 
the  space  currently  reserved  for  it,  and  a  marker  signalling 
whether  it  is  in  use  or  free  to  be  re-used.  The  system  provides 
procedures  for  allocating  new  strings,  assigning  literals  and 
other  strings  to  strings,  concatenating  strings  and  releasing 
occupied  string  space.  All  string  description  records  are  linked 
together  in  a  list  for  garbage  collection  purposes.  As  soon  as  a 
string  is  to  be  allocated  but  no  more  workspace  is  available, 
three  passes  of  garbage  collection  and  storage  compaction  are 
attempted  to  recover  space  for  this  request:  first  the  list  of 
string  descriptors  is  searched  for  a  string  large  enough  to 
satisfy  the  request.  If  this  does  not  succeed,  then  as  many 
unused  strings  as  necessary  are  removed  from  the  list  and  the 
workspace  is  properly  compacted.  Finally  a  necessary  amount  of 
strings  may  have  their  allocated  lengths  reduced  to  their 
current  lengths. 

2.7.  Summary  of  Attributes 

Any  symbol  of  a  translation  grammar  may  have  one  or  several 
attributes  associated  with  it  [7].  The  following  briefly 
describes  the  meaning  of  these  attributes  with  respect  to  the 
grammatical  symbols,  and  whether  they  are  inherited  or  syn- 
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thesized: 

<identifierlist>(FIRSTIO,  CARIDLST) : 

FIRSTID .  identifier  index  (inher.). 

CARIDLST. . . .  list  of  identifiers  (synth. ) . 

<nonidentconstrem>(SIGN,  CONSTYP,  CONSVAL): 

SIGN .  sign  of  non  identifier  constant  (inher.). 

CONSTYP .  pointer  to  its  type  record  (synth.). 

CONSVAL .  its  explicit  value  (synth.). 

<nonidentconstant> ( CONSTYP ,  CONSVAL ) : 

Same  as  for  <nonidentconstrem>  (both  synth. ) . 

<constant> ( CONSTYP ,  CONSVAL ) : 

Same  as  for  <nonidentconstrem>  (both  synth. ) . 

<constantlist>(FIRSTTYP,  FIRSTVAL,  CARCONSLST,  MATCHTYP ) : 
FIRSTTYP. . . .  pointer  to  type  record  (inher.). 

FIRSTVAL....  explicit  value  of  constant  (inher.). 

CARCONSLST. .  list  of  explicit  values  of  constants  (synth.). 
MATCHTYP....  pointer  to  a  (scalar)  type  record  which  must 
match  to  all  constant  types  incl.  FIRSTTYP 
( inher. ) . 

<simpletyperemaind>( FIRSTID,  RETTYP,  REFID): 


FIRSTID .  identifier  which  begins  <type>  (inher.). 

RETTYP .  pointer  to  completed  type  record  (synth.). 

REFID .  index  of  type  identifier  which  is  to  be  defined 

(if  explicit  type  definition,  otherwise  zero); 
necessary  for  scalar  types,  (inher.). 


< simple type > (RETTYP,  REFID): 

Same  as  for  <simpletyperemaind>. 

<simpletypelist>( FIRSTTYP,  CARSMPLST,  SWPACK) 
FIRSTTYP....  pointer  to  type  record  (inher.). 
CARSMPLST...  list  of  (scalar)  type  records  (synth.). 


SWPACK .  flag  whether  corresponding  array  is  packed  or 

not  (inher.). 

<variant> ( RECLST,  MATCHTYP ) : 

RECLST .  list  of  field  identifier  descriptors  (synth.). 


MATCHTYP....  type  record  of  proceeding  tag  field  (inher.). 

< var i anti i st > ( F I RSTVAR ,  CARRECLST,  MATCHTYP): 

FIRSTVAR. . . .  list  of  field  identifier  descriptors  (inher.). 
CARRECLST...  list  of  field  identifier  descriptors  (synth.). 
MATCHTYP....  type  record  of  proceeding  tag  field  (inher.). 
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<tagf ieldremainder> ( FIRSTID,  REC,  MATCHTYP): 

FIRSTID .  identifier  which  begins  tag  field  (inher.). 

REC .  tag  field  identifier  descriptor  (if  any  other¬ 

wise  nil)  (synth.). 

MATCHTYP....  pointer  to  type  record  of  tag  field  (synth.). 
<fieldlistremaind>(RECLST) : 

RECLST .  list  of  field  identifier  descriptors  (if  any 

otherwise  nil )  (synth.). 

<recordsection>( RECLST) : 

Same  as  for  <fieldlistremaind> . 

<fieldlist>(FLDLST) : 

FLDLST .  list  of  field  identifier  descriptors  (synth.). 

<unpackstructtype> ( RETTYP ,  SWPACK,  REFID): 

rettyP .  pointer  to  complete  type  record  (synth.). 

SWPACK .  flag  whether  type  is  packed  or  not  (inher.). 

REFID .  same  as  REFID  in  < simple typremaind>  (inher.). 

<type>( RETTYP,  REFID): 

Same  as  in  <unpackstructype> . 

<formalparameter>(RETARG,  RETACT) : 


RETARG .  type  record  pointer  list  of  parameters  as  re¬ 

quired  in  the  ARGTYPS  field  of  a  procedure  or 
function  identifier  description,  (synth.). 

RETACT .  list  of  parameter  desriptions  (synth.). 


<formparameterlist>(FIRSTARG,  FIRSTACT,  CARARGLST ,  CARACTLST) : 
FIRSTARG. . . .  list  of  type  record  pointers  (inher.). 
FIRSTACT....  list  of  identifier  record  pointers  (inher.). 
CARARGLST...  list  of  parameter  types  (synth.). 

CARACTLST...  list  of  parameter  descriptions  (synth.). 

<formparameterpart>(ARGLST,  ACTLST) : 


ARGLST .  list  of  all  parameter  types  (if  any  otherwise 

nil)  (synth.). 

ACTLST .  list  of  all  parameter  descriptions  (if  any 

otherwise  nil)  (synth.). 


<expressionlist>(FIRSTCOD/  FIRSTTYP,  FIRSTSWVAR,  CAREXPLST) : 

FIRSTCOD. . . .  string  pointer  (see  section  2.6)  (inher.). 

FIRSTTYP. . . .  type  record  pointer  (inher. ) . 

FIRSTSWVAR. .  flag  whether  expression  is  a  variable  or  not; 

necessary  to  determine  if  an  argument  is  a  vari¬ 
able  (inher. ) . 

CAREXPLST. . .  list  of  expression  constituents  which  consist  of 
their  type  records,  string  pointers  to  their 
code  and  "variable  flags"  like  FIRSTSWVAR  above 
( synth . ) . 


<variableselector>(SWASSIGN,  REPLLST,  CODIN,  TYPIN, 

CODOUT,  TYPOUT): 

SWASSIGN. . . .  flag  signalling  if  called  at  left  hand  side  of 


assignment  (inher.). 

REPLLST .  list  of  subscripts  and  array  bounds  in  case 

SWASSIGN  is  true;  needed  to  compile  according  to 
the  last  representation  rule  of  section  1.7 
(inher.  and  synth.). 

CODIN .  string  pointer  to  current  code  (inher.). 

TYPIN .  type  record  pointer  of  current  type  (inher.). 

CODOUT .  string  pointer  to  new  code  (synth.). 

TYPOUT .  type  record  pointer  of  new  type  (synth.). 


<actparameterpart> ( ACTEXPLST) : 

ACTEXPLST. . .  same  as  CAREXPLST  in  <expressionlist> . 

<identifierremaind> ( ID,  CODOUT,  TYPOUT): 

ID .  identifier  in  front  of  it  (inher.). 

CODOUT,  TYPOUT..  same  as  in  <variableselectors> . 

<f actor > ( FACCOD,  FACTYP ) : 

FACCOD .  string  pointer  to  code  of  factor  (synth.). 

FACTYP .  type  record  pointer  of  its  type  (synth.). 

<factorlist>(PRIORCOD,  PRIORTYP,  RETCOD,  RETTYP ) : 

PRIORCOD. . . .  string  pointer  (inher. ) . 

PRIORTYP....  type  record  pointer  (inher.). 

RETCOD .  string  pointer  to  complete  code  of  factors 

( synth . ) . 

RETTYP .  type  record  pointer  of  their  resulting  type 

( synth . ) . 

< term> ( TERMCOD ,  TERMTYP ) : 

Similar  to  attributes  of  <factor>. 

<termlist> (PRIORCOD,  PRIORTYP,  RETCOD,  RETTYP): 

Similar  to  attributes  of  <f actorlist> . 

<simpleexpression>(SEXPCOD,  SEXPTYP) : 

Similar  to  attributes  of  <factor>. 

<simpleexpressrem> (PRIORCOD,  PRIORTYP,  RETCOD,  RETTYP): 

Similar  to  attributes  of  <f actorlist> . 

<expression> ( EXPCOD,  EXPTYP): 

EXPCOD .  string  pointer  to  expression  code  (synth.). 

EXPTYP .  pointer  to  its  type  record  (synth.). 

<simplestatemntrem>( ID) : 

ID .  identifier  in  front  of  it  (inher.). 
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<compoundstmntrem>(FIRSTSTMNR,  COMPSTMCOD) : 

FIRSTSTMNR. .  statement  number  (inher.). 

COMPSTMCOD. .  string  pointer  to  code  of  compound  statement 
(see  also  section  1.6)  (inher.  and  synth.). 

<elseclause>(ELSESTMNR) : 

ELSESTMNR. . .  statement  number  (inher.). 

<caseelement>(CONSLST,  MATCHTYP ) : 

CONSLST .  list  of  explicit  values  of  case  labels  (synth.). 

MATCHTYP....  pointer  to  a  (scalar)  type  record  which  must 
match  all  case  label  types  (inher.). 

<caseelementlist> ( FIRSTCONSLST,  CARCONSLST,  MATCHTYP): 

FIRSTCONSLST. .  list  of  expl.  values  of  case  labels  (inher.). 
CARCONSLST. .  list  of  expl.  values  of  all  case  labels ( synth. ) . 
MATCHTYP....  same  as  in  <caseelement> . 

<f or statementrem> ( CONTRTYP ) : 

CONTRTYP. . . .  type  rec.  pointer  of  control  variable  (inher.). 


Each  terminal  (viz.  each  token)  has  only  one  (synthesized) 
attribute:  its  parameter  value,  if  any  was  assigned. 

2.8.  Error  Diagnostics  and  Error  Recovery 

The  error  diagnostic  routines  take  advantage  of  the  LL(1) 
property  of  the  underlying  grammar.  Illegal  tokens  are  dis¬ 
covered  as  soon  as  they  are  obtained  [7]  namely  if  they  are  not 
contained  in  the  selection  set  of  a  non- terminal  which  is  to  be 
derived.  A  typical  error  message  is  thus 

" VAR I ABLE SELECTOR  STARTS  WITH  IDENTIFIER" 

or  if  terminals  do  not  match 

EXPECTED,  BUT  :  FOUND" 

All  other  error  messages  are  adjusted  to  standard  PASCAL  [5] 
though  their  text  usually  includes  some  specific  information 
such  as  an  incorrectly  used  identifier.  The  error  messages  are 
enumerated  according  to  [ 5 ] . 

The  error  recovery  is  probably  the  most  complicated  process 
of  this  compiler.  Some  general  guidelines  are  explained  now,  but 
for  more  details  the  reader  is  referred  to  the  compiler  source. 
The  lexical  scanner  treats  illegal  characters  like  blank  spaces. 
On  encountering  a  bad  token,  the  parser  proceeds  until  it  finds 
a  synchronizing  symbol  (SEMICSYM,  ENDSYM,  ELSESYM,  UNTILSYM) . 
Then  it  ignores  all  symbols  of  the  current  derivation  ("pops  the 
stack")  until  continuation  by  the  synchronizing  token  is 
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possible.  Some  tokens  should  not  be  passed  during  the  synchron¬ 
ization  (PROCSYM,  FUNCSYM,  RECORDSYM,  BEGINSYM,  CASESYM, 
REPEATSYM) ,  because  the  essential  program  structure  would  be 
lost.  In  such  cases  the  compilation  is  halted.  Semantic  errors 
are  repaired  quite  thoroughly.  A  separate  undefined  type  was 
introduced  for  this  purpose  (see  also  UNDFIDNR  in  section  2.5). 
However,  in  some  cases  the  type  might  be  constructed  from  the 
context. 


2.9.  Example#5 

The  following  sample  program  contains  many  errors  which  the 
compiler  detected  and  reported: 


STMNR  LEV  NST  SEMIC  SOURCE  CODE: 


0 

0 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

2 

0 

1 

0 

4 

0 

1 

0 

6 

0 

1 

0 

7 

0 

1 

1 

8 

0 

1 

1 

8 

0 

1 

1 

9 

0 

1 

1 

9 

0 

1 

1 

10 

0 

1 

0 

12 

0 

1 

0 

13 

0 

1 

0 

13 

0 

1 

1 

14 

0 

1 

2 

16 

0 

1 

1 

17 

0 

1 

0 

18 

0 

1 

0 

20 

0 

1 

0 

20 

0 

2 

0 

21 

0 

1 

0 

22 

0 

2 

0 

23 

2 

2 

1 

24 

3 

2 

1 

24 

4 

2 

1 

25 

5 

2 

1 

26 

6 

2 

1 

27 

8 

1 

1 

28 

9 

1 

1 

28 

10 

1 

2 

28 

11 

1 

2 

29 

12 

1 

2 

30 

13 

1 

2 

32 

14 

2 

2 

32 

PROGRAM  ERRONEOUS (INPUT); 

(*  This  program  tests  error  diagnostics 
and  error  recovery  *) 

LABEL  1,  2,  1; 

CONST  C1='A';  C2='YZ'; 

TYPE  SR1=C1..'Z';  SR2=-5..0; 

PTR=@REC2 ; 

RECl=RECORD  RF1,  RF2 :  CHAR; 

CASE  PTR  OF 
’B’:  (RF3 :  @REC1); 

’Z':  (RF1 :  INTEGER) 

END; 

VAR  VI:  BOOLEAN;  V2 :  @REC1; 

V3 :  (TRUE,  FALSE); 

V4:  PACKED  ARRAY  (.SRI,  (ONE,  TWO).)  OF 
RECORD  VF1 :  ONE.. TWO; 

VF2 :  RECORD  VF3 :  @REC1;  END; 

END; 

C2:  INTEGER; 

PROCEDURE  P ( A:  SR2 ) ;  FORWARD; 

(*$S+  allow  extented  syntax  *) 

FUNCTION  F(VAR  PI,  P2 :  BOOLEAN):  1..100; 

FORWARD;  (*$S-  inhibit  extensions  *) 
PROCEDURE  P(A:  SR2 ) ; 

BEGIN  NEW (V2,  'B'); 

3:  IF  TRUE 

THEN  V4(.C2,  ONE . ) . VF2 . VF5 : =V2 ; 

P(  F(NOT  VI,  VI)  ); 

END; 

BEGIN  PUT; 

FOR  V4:=3  DOWNTO  -5  DO 
CASE  V2@.RF1  OF 
'A' ,  ’B' :  PP( -23 ) ; 

'C' :  GOTO  4; 

'D\  'A':;; 

'E':  WITH  V4 ( . ' C ' ,  THREE.),  VF2  DO 


L. 
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15 

3 

2 

33 

VF3@ . RF3@ . RF1  :=  SR2; 

15 

1 

2 

34 

END; 

16 

1 

1 

35 

IF  F(  VI )  <>5  AND  Cl=""  THEN  I:=I  +  1; 

17 

1 

1 

36 

X: =5 . E-7; 

18 

1 

1 

36 

END. 

REF  IDENTIFIER  CLASS,  TYPE,  REFERENCES:  ** ‘ERRONEOUS*** 


1 

$LABEL1 

1 

$LABEL2 

24 

$LABEL3 

29 

$LABEL4 

18 

A 

10 

BOOLEAN 

7 

CHAR 

2 

Cl 

3 

C2 

20 

F 

12 

FALSE 

0 

INPUT 

9 

INTEGER 

13 

ONE 

27 

OUTPUT 

18 

P 

28 

PP 

6 

PTR 

7 

REC1 

6 

REC2 

8 

RF1 

9 

RF1 

8 

RF2 

8 

RF3 

4 

SRI 

5 

SR2 

32 

THREE 

12 

TRUE 

13 

TWO 

32 

VF1 

14 

VF1 

32 

VF2 

16 

VF2 

32 

VF3 

15 

VF3 

10 

VI 

SCALAR,  LABEL  0  ..  9999  (ORDERS  ONLY)  2, 

SCALAR,  LABEL  0  . .  9999  (ORDERS  ONLY) 

SCALAR,  ***  UNDEFINED  *** 

SCALAR,  ***  UNDEFINED  *** 

VARIABLE,  INTEGER  -5  . .  0  (ORDERS  ONLY) 

TYPE,  BOOLEAN  0  ..  1  (ORDERS  ONLY)  20, 

TYPE,  CHAR  0  . .  255  (ORDERS  ONLY) 

SCALAR,  CHAR  0  ..  255  (ORDERS  ONLY)  4,  34, 

SCALAR,  PACKED  ARRAY  ( .  INTEGER  1  . .  2  (ORDERS  ON 
LY)  .)  OF  CHAR  0  ..  255  (ORDERS  ONLY)  18,  24, 
ENTRY (  VAR,  BOOLEAN  0  ..  1  (ORDERS  ONLY)  ;  VAR,  B 

OOLEAN  0  . .  1  (ORDERS  ONLY)  ;  )  :  INTEGER  1  . .  10 

0  (ORDERS  ONLY)  ***  UNRESOLVED  FORWARD  REFERENCE 
***  25,  34, 

SCALAR,  $ IMPLT1  0  ..  1  (ORDERS  ONLY) 

VARIABLE,  @INTEGER 

TYPE,  INTEGER  -2147483647  ..  2147483647  (ORDERS  O 
NLY)  17, 

SCALAR,  $IMPLT2  0  ..  1  (ORDERS  ONLY)  13,  24, 
VARIABLE,  ***  UNDEFINED  *** 

ENTRY (  INTEGER  -5  . .  0  (ORDERS  ONLY)  ;  )  23,  25, 
ENTRY (  INTEGER  -2147483647  . .  2147483647  (ORDERS 
ONLY)  ;  ) 

TYPE,  @REC2  8, 

TYPE,  RECORD  RF2  :,  RF1  :,  RF3  :,  RF1  :,  8,  11,  1 

4, 

TYPE,  REC2  ,  ***  UNDEFINED  *** 

RECORD  FIELD,  CHAR  0  ..  255  (ORDERS  ONLY)  28,  32, 
RECORD  FIELD,  INTEGER  -2147483647  ..  2147483647  ( 
ORDERS  ONLY) 

RECORD  FIELD,  CHAR  0  . .  255  (ORDERS  ONLY) 

RECORD  FIELD,  @REC1  32, 

TYPE,  CHAR  193  ..  233  (ORDERS  ONLY)  13, 

TYPE,  INTEGER  -5  . .  0  (ORDERS  ONLY)  18,  22,  32, 
VARIABLE,  ***  UNDEFINED  *** 

SCALAR,  $IMPLT1  0  . .  1  (ORDERS  ONLY)  24, 

SCALAR,  SIMPLT2  0  ..  1  (ORDERS  ONLY)  13, 

RECORD  FIELD,  $IMPLT2  0  ..  1  (ORDERS  ONLY) 

RECORD  FIELD,  $IMPLT2  0  ..  1  (ORDERS  ONLY) 

RECORD  FIELD,  RECORD  VF3  :,  32, 

RECORD  FIELD,  RECORD  VF3  :,  24, 

RECORD  FIELD,  @REC1 

RECORD  FIELD,  QREC1 

VARIABLE,  BOOLEAN  0  ..  1  (ORDERS  ONLY)  25,  25,  34 


48 


11 

V2 

VARIABLE, 

@REC1  23,  24,  28, 

12 

V3 

VARIABLE, 

$ IMPLT1  0  ..  1  (ORDERS  ONLY) 

13 

V4 

VARIABLE, 
ONLY)  . ) 
RS  ONLY) 

PACKED  ARRAY  ( .  CHAR  193  . .  233 
OF  PACKED  ARRAY  (.  $IMPLT2  0  .. 

.  )  OF  RECORD  VF1  : ,  VF2  : ,  24,  28 

(ORDERS 
1  (ORDE 
,  32, 

35 

X 

VARIABLE, 

***  UNDEFINED  *** 

ERRNR  SEMIC  COL  ERROR  MESSAGE  LISTING:  ** ‘ERRONEOUS*** 

101  2  44  IDENTIFIER  ' $ LABEL 1  '  DECLARED  TWICE 

398  8  13  VARIANTS  WILL  BE  TREATED  AS  RECORDS 

110  8  44  TAGFIELD  TYPE  MUST  BE  SCALAR  OR  SUBRANGE 

101  9  11  TWO  RECORDFIELDS  ' RF1  ' 

104  10  5  IDENTIFIER  ' REC2  ’  UNDECLARED 

101  18  44  IDENTIFIER  ' C2  '  DECLARED  TWICE 

119  23  8  FORW.  DCL. :MUST  NOT  REPEAT  ARGUMENT  LIST 

398  24  44  TAGFIELDVALUES  IN  PROC.  'NEW'  IGNORED 

104  24  5  IDENTIFIER  ' $LABEL3  '  UNDECLARED 

135  24  12  TYPE  OF  OPERAND  MUST  BE  BOOLEAN 

134  24  26  TYPE  CONFLICT: ' SCALAR  '  VERSUS  'ARRAY  ' 

152  24  36  RECORD  FIELD  ' VF5  '  NOT  FOUND 

154  25  22  ACTUAL  PARAMETER  MUST  BE  A  VARIABLE 

134  26  23  TYPE  CONFLICT: ' SCALAR  '  VERSUS  'SCALAR  ' 

104  28  44  IDENTIFIER  'OUTPUT  '  UNDECLARED 

143  28  11  ILLEGAL  TYPE  OF  LOOP  CONTROL  VARIABLE 

104  28  18  IDENTIFIER  'PP  ’  UNDECLARED 

104  30  44  IDENTIFIER  ' $LABEL4  '  UNDECLARED 

104  32  31  IDENTIFIER  'THREE  '  UNDECLARED 

103  33  44  IDENTIFIER  ' SR2  '  OF  WRONG  CLASS 

156  34  44  MULTIDEFINED  CASE  LABEL 

126  34  12  ACTUAL  NUMBER  OF  ARGUMENTS  UNEQUALS  DCL. 

134  34  21  TYPE  OF  OPERAND ( S )  MUST  BE  BOOLEAN 

25  34  21  THEN  EXPECTED, BUT  =  FOUND 

104  35  5  IDENTIFIER  'X  '  UNDECLARED 

201  35  7  ERROR  IN  REAL  CONSTANT:  DIGIT  EXPECTED 

398  35  7  REAL  NUMBERS  ARE  NOT  IMPLEMENTED 

26  35  8  FACTORLIST  STARTS  WITH  IDENTIFIER 

167  36  44  UNSPECIFIED  LABEL  '4  ' 

167  36  44  UNSPECIFIED  LABEL  '2  ' 

167  36  44  UNSPECIFIED  LABEL  '1  ' 

117  36  44  UNSATISF.  FORWARD  REFERENCE  'F  ’ 


2.10.  Performance  and  Implementation  Notes 


The  compiler  is  a  single  PASCAL  program.  It  consists  of 
5400  lines  of  source  code  and  its  object  module  generated  by  the 
PASCAL  8000  (RPI  Version)  compiler  occupies  160K  bytes  of 
storage,  excluding  a  variable-sized  run-time  stack.  The  compiler 
is  structured  into  102  procedures  and  functions.  It  took  the 
compiler  1.71  seconds  to  compile  the  program  in  section  1.16  on 
an  IBM  3033  machine  running  under  the  Michigan  Terminal  System. 
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The  compiler  program  uses  three  external  files:  INPUT  for 
the  program  to  be  compiled,  OUTPUT  for  compiler  listings  and 
messages,  and  SFUNCH  for  the  lambda-calculus  code  produced.  Only 
the  first  100  characters  of  an  input  line  are  analyzed,  and  the 
maximum  number  of  characters  on  code  lines  is  currently  set  to 
72.  But  these  numbers  can  be  changed  easily.  The  routines  gener¬ 
ating  a  cross-reference  were  added  for  debugging  purposes  only 
and  are  coded  rather  inefficiently. 

The  following  identifiers  are  pre-defined  as  in  standard 
PASCAL:  BOOLEAN,  CHAR,  CHR,  FALSE,  GET,  INTEGER,  ORD,  PRED,  PUT, 
REAL,  SUCC,  TRUE.  INPUT  or  OUTPUT  are  files  of  INTEGER  when 
specified  as  program  parameters. 

During  the  development  of  the  compiler,  the  parser  was 
actually  generated  automatically  using  a  computer  program.  This 
program  employs  recursive  PASCAL  procedures,  one  for  each  non¬ 
terminal,  in  place  of  a  pushdown  stack  [7].  Synthesized  and 
inherited  attributes  become  variable  and  value  parameters 
respectively.  A  global  switch  is  set  if  the  parsing  process 
attempts  to  recover  from  an  erroneous  input  token.  In  this  case, 
the  body  of  a  procedure  is  skipped  if  its  corresponding  symbol 
is  supposed  to  be  popped  of  the  stack.  It  should  be  noted  that 
this  is  just  a  method  of  coding  an  LL(1)  parser  and  must  not  be 
confused  with  recursive  descent  methods  [9]. 

The  compiler  itself  is  successfully  processed  by  the 
parser. 
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